1. Introduction
Point cloud data is inevitably corrupted with noise, with the increasing access to scanning facilities especially the consumerlevel depth sensors. Point cloud filtering is to reconstruct a point set by removing noise from its corresponding noisy input. The filtered point clouds can be used in various graphics applications, for example, point set resampling, surface reconstruction, pointbased rendering and computer animation. Removing noise while preserving geometric features still remains a challenge in point cloud filtering.
Stateoftheart techniques have made remarkable progress in point cloud filtering. Specifically, the LOPbased methods (LOP [Lipman2007TOG], WLOP [Huang2009TOG], CLOP [Preiner2014TOG]
) are robust to noise and outliers. The RIMLS
[Ztireli2009CGF] and GPF [Lu2018TVCG] consider to preserve geometric features. Nevertheless, these techniques still suffer from either feature smearing or less robustness in filtering. More precisely, the LOPbased techniques [Lipman2007TOG; Huang2009TOG; Preiner2014TOG] are not designed for preserving sharp features, because of their inherent isotropic property. RIMLS [Ztireli2009CGF] and GPF [Lu2018TVCG] can preserve geometric features to some extent; however, they depend greatly on the ability of normal filters which become less robust when meeting either large noise or irregular sampling. Furthermore, GPF [Lu2018TVCG] is not easy to find a radius that balances noise removal and gaps near edges, and it is slow due to the considerable amount of computation of the EM optimization. Finally, they all need trialanderror parameter tuning which is boring and timeconsuming, to achieve desired visual results. Point cloud filtering with resorting to deep learning has been rarely studied so far [Yu2018ECCV; Rakotosaona2019CGF]. Nevertheless, they still generate limited results with either smoothing out sharp features or poor generalization. As an ending result, the above limitations substantially restrict the robustness and applicability of these stateoftheart point cloud filtering techniques. We show the comparison of the main characteristics between these techniques and our method in Table 1.[width=6em]MethodsCHs 




















In this paper, we propose a novel point cloud filtering approach to settle the above issues. Motivated by the noticeable successes of AutoEncoder and PointNet [Schmidhuber2015NN; Qi2017CVPR]
in vision tasks, we design a learning framework referred to as Pointfilter for point cloud filtering. In particular, our framework is an encoderdecoder based architecture which straightforwardly takes the raw neighboring points of each noisy point as input, and regresses a displacement vector to push this noisy point back to its ground truth position. In designing the loss function, we also take geometric features into account so that features can be preserved by this network. Given a noisy point cloud as input, our trained model can automatically and robustly predict a corresponding quality point cloud, by removing noise and preserving geometric features. Various experiments demonstrate that our method achieves better performance than the stateoftheart techniques (or comparable to optimization based methods like RIMLS and GPF which require trialanderror parameter tuning), in terms of visual quality and error metrics. Our method is fast and avoids manual parameter tuning. The main contributions of this work are:

a novel framework that achieves point cloud filtering by encoderdecoder modeling;

a powerful loss function that takes geometric features into account;

extensive experiments and analysis in terms of visual quality and quantity;

the source code and involved dataset which will be made publicly available.
2. Related Work
We first review point cloud filtering techniques, and then look back to some previous works that conducted deep learning on point clouds.
2.1. Point Cloud Filtering
Point cloud filtering can be generally classified into two types: twostep based techniques and projectionbased methods. The twostep based framework consists of at least two steps: normal smoothing and point position update under the guidance of the filtered normals. Avron et al.
[Avron2010TOG] and Sun et al. [Sun2015CAGD] introduced and optimization for point set filtering, respectively. Recently, with sharp feature skeletons, a point cloud smoothing technique was presented based on the guided filter [Zheng2017TVC]. Zheng et al. [Zheng2018CAGD] extended the rolling guidance filter to point set filtering and designed a new point position updating strategy to overcome sharp edge shrinkage. Lu et al. [Lu2018ArXiv] proposed a twostep geometry filtering approach for both meshes and point clouds. Most point set filtering methods achieve filtered results through projecting the input point set onto the underlying point set surface. One popular category of this type is moving least squares and its variants [Levin1998MoC; Levin2004GMfSV; Alexa2001Vis; Alexa2003TVCG; Amenta2004TOG; Fleishman2005TOG; Ztireli2009CGF]. The moving least squares (MLS) has been seminally formulated by Levin [Levin1998MoC; Levin2004GMfSV]. Some works defined moving least squares (MLS) and extremal surfaces [Alexa2001Vis; Alexa2003TVCG; Amenta2004TOG]. Later, two different variants have been presented for projection: statisticsbased and robust implicit moving least squares (RIMLS) [Fleishman2005TOG; Ztireli2009CGF]. Lange et al. [Lange2005CAGD]developed a method for anisotropic fairing of a point sampled surface using an anisotropic geometric mean curvature flow. Recently, the LOP (locally optimal projection) based methods have become increasingly popular. For example, Lipman et al.
[Lipman2007TOG] proposed the locally optimal projection operator (LOP) which is parameterization free. Later, Huang et al. [Huang2009TOG]presented a weighted LOP (WLOP), which enhances the uniform distribution of the input points. A kernel LOP has also been proposed to speed up the computation of LOP
[Liao2013CAD]. More recently, a continuous LOP (CLOP) has been presented to reformulate the data term to be a continuous representation of the input point set and arrives at a fast speed [Preiner2014TOG]. Note that a few projectionbased methods utilize smoothed normals as prior to preserve geometric features, such as EAR [Huang2013TOG] and GPF [Lu2018TVCG].2.2. Deep Learning on Point Clouds
Qi et al. [Qi2017CVPR] proposed the pioneering network architecture, named PointNet, that can consume raw point clouds directly. The key ingredient of PointNet is that using fully connected layers (MLPs) to extract features instead of convolution operators which are not suitable in irregular domain. Although PointNet achieves remarkable success in shape classification and segmentation, pointwise features fail to characterize local structures which is crucial for highlevel semantic understanding. To do this, an improved version, PointNet++ [Qi2017NIPS], was proposed to aggregate local structures in a hierarchical way. Following PointNet and PointNet++, lots of network architectures applied on raw point clouds emerged. For instance, based on dynamic local neighborhood graph structure, Wang et al. [Wang2019TOG] designed an EdgeConv block to capture the relationships both in spatial and feature space. At the same time, an alternative convolutional framework, SpiderCNN [Xu2018ECCV], was proposed to aggregate neighboring features by a special family of parameterized weighted functions instead of MLPs. Inspired by the Scale Invariance Feature Transform [Lowe2004IJCV] (SIFT) which is a robust 2D representation, the SIFTlike module [Jiang2018ArXiv] was developed to encode information of different orientations and scales and could be flexibly incorporated into PointNetstyle networks. Besides shape classification and segmentation tasks, there are few point input network architectures applied on upsampling [Yu2018CVPR; Yifan2019CVPR]
, local shape properties estimation
[Guerrero2018CGF] and so on. As for point cloud filtering, Roveri et al. [Roveri2018CGF] proposed a filtering network, PointProNet, designed for consolidating raw point clouds corrupted with noise. Benefiting from powerful 2D convolution, PointProNet transfers 3D point clouds consolidation into 2D height map filtering. To preserve sharp edges while filtering, Yu et al. [Yu2018ECCV] introduced an novel edgeaware network architecture, ECNet, by incorporating a joint loss function. Recently, a twostage network architecture, PointCleanNet (PCN) [Rakotosaona2019CGF], was developed for removing outliers and denoisnig separately.3. Method
3.1. Overview
Given a noisy point cloud, we aim to restore its clean version by our Pointfilter in a manner of supervised learning. Before introducing details of our Pointfilter framework, we first formulate noisy point cloud as follows:
(1) 
where is an observed point cloud corrupted with noise, is the corresponding clean point cloud (underlying surface) and is the additive noise. In this work, we address the filtering problem in a local way, which means the filtered result of a noisy point only depends on its neighboring structure. As we know, point cloud filtering is an illposed problem and it is difficult to straightforwardly regress the additive noise for each noisy point like image filtering. As an alternative, we handle point cloud filtering by projecting each noisy point onto the underlying surface. More specifically, we treat the additive noise as displacement vectors between the noisy point cloud and the clean point cloud , and learn the displacement vector for each noisy point. To achieve this, we propose an encoderdecoder architecture network, named Pointfilter, to regress the additive noise , shown in Fig. 1. We briefly introduce a preprocessing step for the input data in Section 3.2, and then show how to model our Pointfilter in Section 3.3. We finally explain how we train our network in Section 3.4 and how we make inference with the trained network in Section 3.5.
3.2. Preprocessing
Given a pair of point clouds and , the noisy patch and its corresponding ground truth patch are defined as follows
(2) 
where , and
is the patch radius. Once patches are generated, two issues need to be addressed in point cloud filtering: (1) how to avoid unnecessary degrees of freedom from observed space? (2) how to guarantee our Pointfilter is insensitive to certain geometric transformations (e.g. rigid transformations)? For the first issue, an immediate remedy is to translate patches into origin and then scale them into unit length, i.e.,
. Similarly, the ground truth patch does the same thing, i.e., . To be invariant to rigid transformations (e.g., rotations), a few methods [Qi2017CVPR; Qi2017NIPS] attempted to predict rotation matrixvia an additive spatial transformer network, while it has been proven to be fragile to rotations without massive data augmentation
[you2018ArXiv]. In this work, we align the input patches by means of aligning their principle axis of the PCA axes with the Zaxis. The alignment process is illustrated in Fig. 2. To effectively tune network parameters with batch processing, the number of points in each input patch should be the same. In experiments, we empirically setby default. We pad the origin for patches with insufficient points (
) and do random sampling for patches with sufficient points (). As for patch generation, the patch radius is default to of the model’s bounding box diagonal.3.3. The Pointfilter Framework
The architecture of our point cloud filtering framework is demonstrated in Fig. 1. The key idea of our Pointfilter is to project each noisy point onto the underlying surface according to its neighboring structure. To achieve this, we design our Pointfilter network as an encoderdecoder network. Specifically, the encoder consists of two main parts: (1) feature extractors (i.e., MLPs) that are used to extract different scales of features; (2) a collector that is used to aggregate the features () as a latent vector . The encoder module attempts to obtain a compact representation for an input patch. In the decoder module, a regressor is employed to evaluate the displacement vectors with the latent representation vector as input. In our paper, we adopt the recent PointNet [Qi2017CVPR] as the backbone in our Pointfilter. In practice, the extractors and collector are realised by the shared MLPs and max pooling layer, respectively, and the regressor is constructed by three fully connected layers. Details of our Pointfilter network are shown in Fig. 1. At the beginning of our Pointfiler, a PCAinduced rotation matrix is applied to transform the input patch to a canonical space. Therefore, at the end of our Pointfiler, an inverse matrix should be multiplied by the evaluated displacement vector to get final displacement vector.
Loss function. To enable the filtered point cloud approximating the underlying surface while preserving sharp features, the loss function should be elaborately defined. A simple option for measuring the filtered point cloud would be the distance, which has been used in [Rakotosaona2019CGF]. As shown in Fig. 4, compared with the based distance (4 (b)) which is sampling dependent, a more general alternative is to project noisy points onto the underlying surface (4 (c), (d)). Moreover, the based distance can hardly retain sharp features in point cloud filtering. Thus, the loss should be capable of measuring the projection distance. Inspired by [Kolluri2008TOA], our projection loss is defined as
(3) 
where is the filtered point of the noisy point , and is the groundtruth normal of the point . And is a Gaussian function giving larger weights to points near , which is defined as
(4) 
where is the support radius which is normally defined as . Here, is the diagonal length of the bounding box of and [Huang2009TOG]. Besides approximating the underlying surface, we also assume that the filtered point cloud should have a relatively regular distribution. To do this, a repulsion term is added to mitigate points aggregation. Therefore, the whole loss function is formulated as:
(5) 
where is a tradeoff factor to control the repulsion force in the filtering process and we empirically set in our training stage. However, we observed that the above project loss would generate gaps near sharp features, since the normalbased distance would increase in the geometric dissimilarity by definition (see Fig. 4 (c)). Although the regular loss function can alleviate gaps to some extent, it still fails to preserve sharp feature duing filtering process (see Fig. 5) (b)). We address this issue by considering the normal similarity in our loss function, in which we introduce a bilateral mechanism to construct the projection distance formula (Eq. (3)). Specifically, the function is defined as the normal similarity between the current point and its neighboring points in the patch. For simplicity, the function is [Huang2013TOG], where is the normal of filtered point and is support angle that the default value is . Thus, our final projection function is defined as
(6) 
For efficiency and simplicity, the normal of the filtered point is assigned by the normal of the ground truth point which is nearest to the filtered point. It should be noted that our Pointfilter only requires groundtruth point normals in the training stage.
We chose the encoderdecoder structure because: (1) it is stable and mature; (2) it can learn complex and compact representations of point cloud; (3) the learned latent representations are helpful to regress the displacement vector according to the input noisy patch.
3.4. Network Training
Our Pointfilter is implemented in PyTorch on a desktop machine with an Intel Core I78750H CPU (2.20 GHz, 16GB memory) and a GeForce GTX 1060 GPU (6GB memory, CUDA 9.0). The epoch number of the training stage is
. SGD is set as our optimizer and the learning rate is decreased from 1e4 to 1e8. The minibatch size is 64 and batchnormalization
[Ioffe2015ArXiv], ReLU [Nair2010ICML] and TanH are used in our Pointfilter framework.3.5. Network Inference
Given a trained Pointfilter, our approach filters noisy point cloud in a pointwise way. Firstly, we build a patch structure for each noisy point and transform each patch to a canonical space by following Section 3.2. Secondly, each preprocessed patch is fed into the trained Pointfilter to infer a displacement vector. Finally, the displacement vector evaluated by our Pointfilter should be mapped back to the original space. The inference can be formulated as follows:
(7) 
where and are the filtered point and noisy point, respectively. represents our Pointfilter. is the PCAinduced rotation matrix, and is the patch radius. To get better filtered results, we adopt multiple iterations of inference to progressively filter the noisy point cloud, especially for point clouds corrupted with larger noise.
4. Experimental Results
4.1. Dataset
As a supervised learning method, we prepare a training dataset consisting of 3D clean models ( CAD models and nonCAD models) which are shown in Fig. 6. Each model is generated by randomly sampling
k points from its original surface. Given a clean model, its corresponding noisy models are synthesized by adding Gaussian noise with the standard deviations of
, , , , and of the clean model’s bounding box diagonal. In sum, our training dataset contains () models. Notice that these models are our final training dataset, and we do not augment any data onthefly in training. Besides, the normal information for clean models are required for training, as indicated in Eq. (6).To demonstrate the generalization of the proposed Pointfilter, our test dataset includes both synthesized noisy models and rawscan models, which will be explained in the following experiments (Section 4.4 and 4.5).
4.2. Compared Techniques
We compare our approach with the stateoftheart point cloud filtering techniques, namely WLOP [Huang2009TOG], CLOP [Preiner2014TOG], RIMLS [Ztireli2009CGF], GPF [Lu2018TVCG], ECNet [Yu2018ECCV] and PointCleanNet (PCN) [Rakotosaona2019CGF]. Specifically, RIMLS and GPF are designed to preserve sharp features by incorporating smoothed normals. For fair comparisons and visualization purposes, we (i) tune the main parameters of each state of the art technique to achieve as good visual results as possible (ECNet, PCN and our method have fixed parameters); (ii) employ the same surface reconstruction parameters for the same model. Notice that surface reconstruction is straightforwardly applied to the filtered point sets. As for PCN, we use the source code released by the authors to train a new model over our training dataset. Since ECNet requires manually labelling polylines of sharp edges for training, which is infeasible for training a new model on our training dataset, we simply utilize the trained model released by the authors instead. We compare our method with these methods, in terms of both visual quality and quantity.
4.3. Evaluation Metric
For the sake of analysing performance of our Pointfilter quantitatively, the evaluation metrics should be defined. The Pointfilter aims to project noisy points onto its underlying surface. It is intuitive to evaluate the distance errors by averaging the distances between a point in the ground truth and its closest points in the filtered point cloud
[Lu2018TVCG]. The distance error between two models can be defined as(8) 
where is the ground truth point and is one of its neighboring point in the filtered point cloud. and represents the nearest neighbors. In our paper, we set as . Inspired by [Fan2017CVPR], we also introduce the Chamfer Distance (CD) to evaluate the error between the filtered point cloud and its corresponding ground truth (clean point cloud). CD is defined as
(9) 
where and represent the cardinalities of the clean point cloud and the filtered point cloud , respectively. The CD metric finds the nearest neighbor in the other set and sums the squared distances up. It can be viewed as an indicator function which measures the “similarity” between two point sets. Also, it can be easily implemented in parallelism.






4.4. Visual Comparisons
Point clouds with synthetic noise. The synthetic noise level is estimated by the diagonal length of the bounding box. For example, noise denotes of the diagonal length. As shown in Fig. 3, we test four CAD models (Boxunion, Cube, Fandisk and Tetrahedron) with noise. Compared with the stateoftheart point cloud filtering techniques, we observe that results by our Pointfilter generates visually better results, in terms of noise removal and features preservation. Note that RIMLS and GPF can also preserve sharp features to some extent; however, they depend greatly on the capability of normal filters which become less robust when meeting large noise. Compared to RIMLS and GPF, we elegantly detour the normal filtering issue since our framework requires the easily obtained groundtruth normals for training only. As shown in Fig. 10, RIMLS and GPF produce less desired results when handling noise, for example, obvious gaps in sharp edges (10 (b)) and striking outliers (10 (c)). By contrast, our Pointfilter can also preserve sharp features. Since the outliers exist in RIMLS and GPF results, their filtered point clouds involve shrinkage to some extent. Despite that WLOP and CLOP are good at generating smooth results, they still fail to retain sharp features. Regarding ECNet, it generates less pleasing results, in terms of removing noise. Besides CAD models, we also test some nonCAD models corrupted with synthetic noise. As shown in Fig. 7, our proposed Pointfilter can also output visually decent results while preserving the geometric features. The reconstruction results of WLOP and CLOP has slight shrinkage in arms and legs (Fig. 8 (b) and 8 (c)).
Point clouds with raw noise. We also evaluate our Pointfilter on raw scanned point clouds corrupted with raw noise. Since the ground truth models of these raw scanned point sets are not available, we demonstrate the visual comparisons with other methods, as suggested by previous techniques [Lu2018TVCG]. Notice that we do not retrain our Pointfilter for the type of raw noise. From Fig. 11, we see that the result by our method is nicer than the stateoftheart techniques. Besides noise removal, our Pointfilter is capable of retaining geometric features which are marked by yellow box and black arrows in Fig. 11. Fig. 13 shows that our approach induces a better enhancement on the surface reconstruction quality, in terms of preserving geometric features. Fig. 12 shows a virtually scanned point cloud model. Compared with other filtering methods, our Pointfilter still produces higher quality results, in terms of preserving sharp edges. In addition, we also test our Pointfilter on scenelevel models from the ParisrueMadame Database [Serna2014ICPRAM] (see Fig. 18 and Fig. 19). As we can see from these figures, our Pointfilter is still able to produce better results.
Point clouds with strong outliers. Although the Pointfilter is not particularly designed for outliers removal, we still produce competitive results in point clouds with large outliers. As suggested by [Lu2018TVCG], we also conduct an experiment by comparing our method with LOPbased methods (WLOP, CLOP) which are robust to outliers due to the norm term. This experiment demonstrates the capability of our Pointfiler in dealing with strong outliers. From the results in Fig. 14, we can observe that the Pointfilter can also generate a comparable result to WLOP and CLOP.
4.5. Quantitative Comparisons
Besides the above visual comparisons, we also contrast all the methods quantitatively. Specifically, we compare the above defined metrics, as well as the runtime for all methods.
Errors. We calculate the above metrics of all compared methods on some point clouds. These point sets are achieved by adding synthetic noise to the ground truth. To depict the distance errors, we calculate the mean square error (MSE) for each ground truth point via Eq. (8) and visualize the results in Fig. 9. From these visualization results, it is seen that the Pointfiler generates comparable or surpassing results, especially on the sharp features. To comprehensively evaluate our Pointfiler, we also introduce the chamfer distance (CD) to compute the overall error between a filtered point cloud and its ground truth. As illustrated in Fig. 15, we calculate the average errors of the models appeared in Fig. 9 in terms of MSE and CD, respectively. Despite that the results of RIMLS are comparable to our results, it requires trialanderror parameter tuning to obtain satisfactory results. Moreover, such parameter tuning is boring and timeconsuming, and even becomes difficult for users who do not have any background knowledge. By contrast, our method is automatic and simple for users to use, and is the most accurate one among all compared approaches.
Runtime. Because surface reconstruction is an application over point clouds, we only calculate the runtime of each point set filtering method. In particular, optimizedbased methods (RIMLS, GPF, WLOP and CLOP) involve multiple steps and require trialanderror efforts to tune parameters to produce decent visual results, which means these methods generally require a much longer “runtime” in practice. Thus, we only consider learningbased methods (ECNet, PCN and Our), in terms of time consumption in the test stage. Table 2 summaries the runtime of each learningbased method on some point clouds using the same configuration. Table 2 sees that ECNet is the fastest method among the learningbased methods, as it is an upsampling method and only requires a few patches evenly distributed on the noisy input in the test phase. For fair comparisons, we only consider the runtime of the noise removal module in PCN, because extra time consumption would be introduced for the outliers removal module. In spite of this, PCN is still the slowest one. Our approach ranks the second in speed, and we suspect that it is due to the pointwise manner.
5. Conclusion
In this paper, we proposed a Pointfilter framework for featurepreserving point cloud filtering. Our architecture can be easily trained. Given an input noisy point cloud, our method can automatically infer the involved displacement vectors and further the filtered point cloud with preserved sharp features. Extensive experiments and comparisons showed that our method outperforms the stateoftheart point set filtering techniques (or comparable to optimization based methods like RIMLS and GPF which need trialanderror parameter tuning), in terms of both visual quality and evaluation errors. Our approach is automatic and also achieves impressive performance on test time. Compared to PCN [Rakotosaona2019CGF], it should be noted that our Pointfilter is not designed for largerscale outliers removing, and our Pointfiler and PCN are thus complementary in terms of sharp features preservation and heavy outliers removal.
Our method involves a few limitations. First, our Pointfilter becomes hard to retain the sharp features when handling excessive noise (see Fig. 16). Secondly, our method fails to handle significant holes in point clouds (see Fig. 17). In future, we would like to incorporate global shape information into our framework to help guide point cloud filtering.
Comments
There are no comments yet.