1 Introduction
A point cloud is a geometric data type consisting in an unordered collection of 3D points representing samples of 2D surfaces of physical objects or entire scenes. Point clouds are becoming increasingly popular due to the availability of instruments such as LiDARs and the interest in exploiting the richness of the geometric representation in challenging applications such as autonomous driving. However, the acquisition process is imperfect and a significant amount of noise typically affects the raw point clouds. Therefore, point cloud denoising methods are of paramount importance to improve the performance of various downstream tasks such as shape matching, surface reconstruction, object segmentation and more.
Traditional modelbased techniques [1, 20, 11, 16, 15, 5] have typically focused on fitting a surface to the noisy data. Such techniques work well in lownoise settings but they usually suffer from oversmoothing, especially in presence of high amounts of noise or geometries with sharp edges. Given the success of learningbased methods, in particular those exploiting deep neural networks, in a wide variety of tasks, including image denoising and restoration problems [33, 18], a few works have recently started exploring point cloud denoising with deep neural networks. The most challenging problems in processing point clouds are the lack of a regular domain, such as a grid, and the fact that a point cloud is just a set of points and any permutation of them still represents the same data. Any learningbased method must therefore learn a permutationinvariant
function that can deal with data defined on an irregular domain. This is a significant challenge that point cloud processing algorithms tackled by either approximating the irregular domain with a grid, e.g. by building voxels, or building a permutationinvariant function as a composition of operations acting on single points (e.g., size1 convolution) and a globally symmetric function (e.g., a max pool) as done by PointNet
[21]. Neither of these solutions is completely satisfactory. The former introduces an undesirable approximation, while the latter lacks the expressiveness of convolutional neural networks (CNN) where the convolution operation extracts features that are localized as functions of the neighborhood of a pixel and features of features are assembled in a hierarchical manner by means of multiple layers, progressively expanding the receptive field. Recently, graph convolution
[3] has emerged as an elegant way to build operators that are intrinsically permutationinvariant and defined on irregular domains, while also exploiting some of the useful properties of traditional convolution, such as localization and compositionality of the features as well as efficient weight reuse. In particular, spatialdomain definitions of graph convolution have been recently applied in several problems involving point clouds such as classification [26], segmentation [31], shape completion [17] and generation [29]. Notably, the point cloud denoising problem has yet to be addressed with graphconvolutional neural networks.In this paper, we propose a deep graphconvolutional neural network for denoising of point cloud geometry. The proposed architecture has an elegant fullyconvolutional behavior that, by design, can build hierarchies of local or nonlocal features to effectively regularize the denoising problem. This is in contrast with other methods in the literature that typically work on fixedsize patches or apply global operations [22, 9]. Moreover, dynamic computation of the graph from similarities among the highdimensional featurespace representations of the points allows to uncover more complex latent correlations than defining neighborhoods in the noisy 3D space. Extensive experimental results show a significant improvement over stateoftheart methods, especially in the challenging conditions of high noise levels. The proposed approach is also robust to structured noise distributions such as the ones encountered in real LiDAR acquisitions.
2 Related work
The literature on 3D point cloud denoising is vast and it can be subdivided into four categories: local surface fitting methods [1, 20, 11, 16, 15, 5], sparsitybased methods [2, 27, 19], graphbased methods [32, 8, 24], and learningbased methods [22, 14, 9, 23]. Among the methods belonging to the first category, the moving least squares (MLS) approach [1] and its robust extensions [20, 11] are the most widely used. Other surface fitting methods have also been proposed for point cloud denoising, such as jet fitting [5] or parameterizationfree local projector operator (LOP) [16, 15]. These methods achieve remarkable performance at low levels of noise, but they suffer from oversmoothing when the noise level is high [13].
A second class of point cloud denoising methods [2, 27, 19] is based on sparse representations. In this case, the denoising procedure is composed of two minimization problems with sparsity constraints, where the first one estimates the surface normals and then the second one uses them in order to update the point positions. However, at high levels of noise the normal estimation can be very poor, leading to oversmoothing or oversharpening [27].
Another approach for point cloud denoising is derived from the theory of graph signal processing [25]. These methods [32, 8, 24] first define a graph whose nodes are the points of the point cloud. Then, graph total variation (GTV)based regularization methods are applied for denoising. These techniques have proved to achieve very strong performance when the noise level is low. Instead, at high noise levels, the graph construction can become unstable, negatively affecting the denoising performance.
In the last years, learningbased methods [22, 14, 9, 23]
, especially the ones based on deep learning, have been gaining attention. Extending convolutional neural networks to point cloud data is not straightforward, due to the irregular positioning of the points in the space. However, in the context of shape classification and segmentation, many methods have recently been proposed specifically to handle point cloud data. PointNet
[21] is one of the most relevant works in this field, where each point is processed independently before applying a global aggregation. Recently, a few methods proposed to extend the approach of PointNet to point cloud denoising. PointCleanNet [22]uses an approach similar to PointNet in order to estimate correction vectors for the points in the noisy point cloud. Instead, in
[9] the authors use a neural network similar to PointNet to estimate a reference plane for each noisy point and then they obtain the denoised point cloud by projecting the noisy point onto the corresponding reference plane. Also PointProNet [23] performs point cloud denoising by employing an architecture similar to PointNet in order to estimate the local directions of the surface. However, the main drawback of these techniques based on PointNet is that they work on individual points and then apply a global symmetric aggregation function, but they do not exploit the local structure of the neighborhood. PointCleanNet addresses this issue by taking as input local patches instead of the entire point cloud. However, this solution is still limited by the fact that the network cannot learn hierarchical feature representations, like standard CNNs.Graphconvolutional networks have shown promising performance on tasks such as segmentation and classification. In particular, DGCNN [31] first introduced the idea of a dynamic graph update in the hidden layers of a graphconvolutional network. However, the denoising problem is significantly different from the classification and segmentation tasks addressed in [31], that rely more on global features instead of localized representations. In particular, there are several design choices that make DGCNN unsuitable for point cloud denoising: the spatial transformer block is not useful for denoising since it seeks a canonical global representation, whereas denoising is mostly concerned with local representations of point neighborhoods and also significantly increases the computational complexity for large point clouds; the graph convolution operation uses a max operator in the aggregation, which is unstable in presence of noise; the specific graph convolution definition is also less general than the one presented in this paper, which allows to implement adaptive filters where the aggregation weights are dependent on the feature vectors instead of being fixed as in [31], as well as incorporating an edge attention term which is especially important in presence of noise because it promotes a lowpass behavior by penalizing edges with large feature variations.
3 Proposed method
In this section we present the proposed Graphconvolutional Point Denoising Network (GPDNet), i.e., a deep neural network architecture to denoise the geometry of point clouds based on graphconvolutional layers. The focus of the paper is to investigate the potential of graph convolution as a simple and elegant way of dealing with the permutation invariance problem encountered when processing point clouds. For this reason, we focus on analyzing the network in a discriminative learning setting where a clean reference is available and it is perturbed with white Gaussian noise. We refer the reader to [14] for a technique to train any point cloud denoising network in an unsupervised fashion only using noisy data.
3.1 Architecture
An overview of the architecture of GPDNet is shown in Fig. 1. At a high level, it is a residual network that estimates the noise component of the input point cloud, which has been shown [33] to be easier than directly cleaning the data. A first block is composed of three singlepoint convolutions that gradually transform the 3D space into an dimensional feature space. Then a cascade of two residual blocks is used, having an inputoutput skip connection to reduce vanishing gradient issues. Each residual block is composed of three graphconvolutional layers. The graph is computed by selecting the nearest neighbors to each point in terms of Euclidean distances in the feature space. Notice that the graph construction is dynamic, i.e., it is updated after every residual block but shared among the graph convolutional layers inside the block to limit computational complexity. Dynamic construction of a similarity graph has been shown to induce more powerful feature representations [31, 29]
and, in the context of a residual denoising network, it progressively uncovers the latent correlations that have not yet been eliminated. Intuitively, dynamic graph construction is preferable over building the graph in the noisy 3D space as neighborhoods might be strongly perturbed at high noise variances, leading to unstable or suboptimal neighbor assignments. All layers are interleaved with batch normalization which stabilizes training, especially in presence of Gaussian noise. Finally, the last graphconvolutional layer projects the features back to the 3D space.
3.2 Graphconvolutional layer
The core of the proposed architecture is the graphconvolutional layer. Graph convolution is a generalization of convolution to data that are defined over the nodes of a general graph rather than a grid. Multiple definitions of graph convolution have been proposed to capture salient properties of classical convolution, notably localization and weight reuse. In this paper, we use a modified version of the EdgeConditioned Convolution (ECC) [26] to address vanishing gradient and overparameterization. In particular, we use some of the approximations introduced in [28] in the context of image denoising.
The graphconvolutional layer has two inputs: a tensor representing a feature vector for each point, and a graph where nodes are points and edges represent similarities between points. The output feature vector
of point at layer is computed by performing a weighted aggregation over its neighborhood as defined by the graph:(1) 
The weights include a selfloop matrix which is shared among all points. The other weights in the aggregation, i.e., vectors , and scalar are computed as functions of the difference between the feature vector of point and point , i.e.,
. This function is implemented as a multilayer perceptron (MLP) with two layers, where the final fullyconnected layer can be approximated by means of a stack of circulant matrices since the number of free parameters would otherwise be very large. The value
is a hyperparameter setting the maximum rank of the aggregation weight matrix obtained by explicitly computing
, again to reduce the number of parameters and memory requirements of the aggregation operation. The parameter is a scalar edge attention term which exponentially depends on the Euclidean distance between feature vectors across an edge:(2) 
being a decay hyperparameter.
This definition of graph convolution has some advantages over alternative definitions such as GraphSAGE [12], FeastNet [30] or DGCNN [31]. In particular, the aggregation weights are functions of feature differences making the filtering operation performed by the graphconvolutional layer adaptive. Moreover, since the function is implemented as an MLP, it can be more general than a fixed function with some learnable parameters.
The graph is constructed by searching for the nearest neighbors of each point in terms of Euclidean distance between their feature vectors. To limit complexity, a search area of predefined size, centered around the point, is defined, e.g., as a fixed number of neighbors in the noisy 3D space (see Fig. 4 for a visual representation of the search area and feature space neighborhoods).
We remark that GPDNet is fullyconvolutional thanks to the graph convolution operation. By fullyconvolutional we mean that the output feature vector of each point at a given layer is obtained as a multipoint aggregation of the feature vectors of neighboring points in the previous layer, thus building complex hierarchies of aggregations. This is in contrast with PointCleanNet [22] which works by processing each patch independently to estimate the denoised version of the central point. That approach does not create hierarchies of features obtained by successive multipoint aggregations, as in a classical CNN. The graphconvolutional structure recovers this behavior and can learn more powerful feature spaces.
3.3 Loss functions
We consider two loss functions to train the proposed method in a supervised setting. The first one is the mean squared error (MSE) between the denoised point cloud
and its noiseless ground truth , i.e.:(3) 
being the number of points in the point cloud. This is the most natural choice in presence of Gaussian noise. However, it does not exploit prior knowledge about the distribution of points. In fact, it does not use the fact that the points may lie on a surface and therefore the tangential component of the noise is not as relevant as the normal component.
This property can be incorporated by regularizing the MSE loss with a term measuring the distance of the denoised point from the ground truth surface. Such measure can be approximated by the proximity to surface metric which computes the distance between the denoised point and the closest ground truth point. The loss function (MSESP) then becomes:
(4) 
for a regularization hyperparameter . Other works also considered proximity to surface in the loss function. Notably, PointCleanNet [22] uses a loss that combines the proximity to surface with a dual term measuring the distance between a ground truth point and the closest denoised point. This is done to ensure that the denoised points do not collapse into filament structures. We found that using the MSE to enforce this property provides better results.
4 Experimental results
In this section an experimental evaluation against stateoftheart approaches as well as an analysis of the proposed technique is performed. Code is available online^{1}^{1}1https://github.com/diegovalsesia/GPDNet.
4.1 Experimental setting
The training and test set are created selecting postprocessed subsets of ShapeNet [6]
repository. This database is composed by 3D models of 55 object categories, each one described by a collection of meshes. Before utilization, the data has to be sampled and normalized. First we sample 30720 uniformly distributed points for each model, then we rescale the obtained point clouds normalizing their diameter in order to ensure that data are at the same scale. More than 100000 patches of 1024 points each are randomly selected from the point clouds to create the training set, taking point clouds from all the categories except 10 reserved for the test set. Each patch is created by randomly selecting a point from a point cloud and collecting its 1023 closest points. The test set is constituted by 100 point clouds taken from ten different categories: airplane, bench, car, chair, lamp, pillow, rifle, sofa, speaker, table. We randomly select ten models from each category and sample 30720 uniformly distributed points from each model.
GPDNet is trained for a fixed noise variance for approximately 700000 iterations, each one characterized by a batch size of 16. The number of features used for all the layers is 99, except for the first three singlepoint convolutional layers where the number of features is gradually increased from 33 to 66 and finally to 99. The Adam optimizer has been employed with a fixed learning rate equal to . Concerning the graphconvolutional implementation, the rank for the lowrank approximation is set to 11, 3 circulant rows are considered for the construction of the circulant matrix, and . During testing, GPDNet takes as input the whole point cloud and a search area is associated to each point of the point cloud, wherein the neighbors are searched and identified. Unless otherwise stated, 16 nearest neighbors in terms of Euclidean distances are used for graph construction.
4.2 Comparisons with stateoftheart
Class  Noisy  DGCNN  APSS  RIMLS  AWLOP  MRPCA  GLR  PCN  GPDNet  GPDNet 

[31]  [11]  [20]  [15]  [19]  [32]  [22]  MSE  MSESP  
airplane  50.32  44.82  28.22  39.73  31.27  28.19  19.56  26.36  17.22  17.58 
bench  48.71  38.70  26.97  32.76  34.08  32.93  20.43  27.64  19.33  19.80 
car  64.34  60.47  47.73  55.56  54.21  44.33  42.22  75.34  38.09  38.14 
chair  60.78  59.69  37.31  45.65  47.91  38.41  34.98  55.10  29.50  29.69 
lamp  59.73  52.54  24.57  34.02  35.23  31.51  19.67  20.58  16.17  17.15 
pillow  69.79  64.28  15.64  21.23  46.36  23.95  17.59  21.07  17.11  19.04 
rifle  38.97  26.99  36.01  49.37  27.79  23.49  15.84  15.09  14.45  14.00 
sofa  69.63  65.05  22.27  28.04  53.08  32.14  30.88  43.36  25.87  27.21 
speaker  73.50  68.72  26.50  30.19  58.92  47.57  40.78  76.09  34.87  35.81 
table  56.21  50.17  27.45  32.63  41.26  34.78  27.12  43.02  24.27  24.64 
Class  Noisy  DGCNN  APSS  RIMLS  AWLOP  MRPCA  GLR  PCN  GPDNet  GPDNet 

[31]  [11]  [20]  [15]  [19]  [32]  [22]  MSE  MSESP  
airplane  97.78  84.40  86.42  106.33  73.32  67.39  36.76  35.27  28.47  27.62 
bench  94.82  64.76  75.51  91.93  82.04  70.05  32.19  30.10  28.72  26.96 
car  102.23  93.43  72.56  103.52  93.38  69.88  55.92  92.23  52.92  51.77 
chair  105.16  94.4 5  81.47  104.38  92.47  73.45  48.62  69.18  46.28  43.73 
lamp  120.65  112.06  65.79  82.40  88.78  77.09  39.93  30.59  27.37  28.60 
pillow  132.57  113.32  22.74  42.54  112.54  73.67  31.38  29.02  23.32  27.25 
rifle  80.40  61.04  92.14  110.51  69.35  55.65  31.81  21.45  28.43  22.48 
sofa  121.02  99.63  42.80  69.92  107.58  72.62  51.12  61.15  40.10  42.04 
speaker  123.27  114.12  46.45  58.28  110.29  77.95  53.75  87.68  49.20  49.57 
table  103.50  84.95  62.64  78.21  89.33  70.87  37.94  43.88  36.06  33.89 
Class  Noisy  DGCNN  APSS  RIMLS  AWLOP  MRPCA  GLR  PCN  GPDNet  GPDNet 

[31]  [11]  [20]  [15]  [19]  [32]  [22]  MSE  MSESP  
airplane  161.79  127.44  175.68  186.24  145.94  123.71  90.55  74.17  45.96  42.30 
bench  161.52  99.36  166.85  182.42  157.29  127.51  83.99  90.34  41.24  36.77 
car  148.74  113.94  141.69  167.78  145.51  109.49  77.56  160.08  72.06  67.43 
chair  163.75  132.91  160.01  155.38  158.12  122.70  79.85  145.56  67.91  60.16 
lamp  204.05  153.02  178.08  198.22  187.31  146.41  109.24  85.31  45.21  44.60 
pillow  215.58  190.32  164.83  196.53  206.14  150.65  85.86  92.84  34.47  38.58 
rifle  144.18  131.91  195.68  176.07  144.22  105.87  89.19  71.57  43.07  29.55 
sofa  184.11  155.51  166.34  190.91  178.93  133.98  89.31  144.72  62.58  65.06 
speaker  186.01  136.72  138.80  162.34  180.45  126.17  84.37  160.26  66.57  63.40 
table  168.32  115.00  171.25  179.81  162.36  125.72  78.06  102.17  50.47  44.80 
In this section the proposed method is compared with stateoftheart methods for point cloud denoising. As described in Section 2, different categories of point cloud denoising methods are present in the literature. In the experiments, we take into account at least one algorithm from each category. APSS [11] and RIMLS [20] are wellknown MLSbased surface fitting methods and they were tested using the MeshLab software [7]. AWLOP [15] is another surface fitting method and it was implemented using the software released by the authors. MRPCA [19] is a sparsitybased method and it was implemented using the code provided by the authors. GLR [32] is one of the most promising works belonging to the graphbased category and it was implemented using the code provided by the authors. PointCleanNet (PCN) [22]
is one of the most recent learningbased methods and its code is publicly available. In order to ensure a fair comparison, PointCleanNet was retrained with additive Gaussian noise at a specific standard deviation, instead of using the blind model released by the authors. We also include a modified version of DGCNN
[31] as an additional baseline. This modified version replaces the segmentation head with a singlepoint convolution to regress the point displacement.As metric to evaluate the performance of the proposed method, we compute the Chamfer measure, also called CloudtoCloud (C2C) distance. This metric is widely utilized in point cloud denoising, because it computes an average distance of the denoised points from the original surface. First, the mean distance between each denoised point and its closest ground truth point is computed, then the one between each ground truth point and its closest denoised point. The Chamfer measure is then their average:
(5) 
The results of the experiments at different noise levels are reported in Table 1, 2 and 3. As described in Sec. 3.3, in the proposed network we consider two different loss functions obtaining two versions of the proposed method, namely GPDNet MSE and GPDNet MSESP. It is clearly visible that both versions of the proposed method significantly outperform stateoftheart methods, especially at medium and high levels of noise, as shown in Table 2 with and Table 3 with . Instead, at low noise level the other algorithms become more competitive and the performance gap decreases, but the proposed method still obtains the best results in the majority of the categories, as reported in Table 1. This can be explained by the fact that most of the other methods involves surface reconstruction or normal estimation, operations that cannot be computed with sufficient accuracy at high levels of noise. Instead, the proposed method directly estimates the denoised point cloud. In addition, it can be observed from Table 3 that the GPDNet MSESP version is particularly effective at high levels of noise, outperforming GPDNet MSE in almost all the categories. This behavior can be explained by the regularizing effect of the surface distance component of the loss, which is especially useful at high noise variance due to the fact that it can incorporate more prior knowledge about the data. The performance difference between the two variants decreases at low noise levels, as shown in Table 1. It is worth noting that DGCNN shows poor performance for the reasons explained in Sec. 2, being originally designed for classification or segmentation. This is in line with the results presented in the PointCleanNet paper [22] where the authors also show the poor denoising performance of DGCNN, and highlights the importance of the design in this paper, which is tailored to the denosing task.
We also consider another metric for a quantitative assessment of the denoised point clouds. In particular, we assess whether an offtheshelf algorithm for surface normal estimation can produce more accurate normals when provided with point clouds denoised by the proposed method. Since surface normals are widely used in many applications, we believe that measuring their quality when extracted from the denoised data is a relevant metric for the characterization of a denoiser. In this experiment we consider a different test set, composed of 5 wellknown point clouds: Armadillo, Bunny, Column, Galera and Tortuga. The change of dataset is motivated by the availability of ground truth normals for these point clouds. For every denoising method considered in the comparison, we compute the unoriented normal vector of each point in the denoised point cloud. The standard algorithm employed for the normal estimation is the builtin MATLAB function, which is based on principal component analysis. We compute the unoriented normal angle error (UNAE) as
(6) 
where is the groudtruth normal vector at and is the estimated normal vector at the denoised point closest to . Table 4 reports the average error across the five test point clouds. A minimum error of about six degrees is reported since the MATLAB algorithm introduces a nonzero estimation error in the computation of the normals, as can be seen from the first column of Table 4. It can be observed that the proposed denoising method, in particular the version with only MSE as loss function, increases the accuracy of the normal estimation, outperforming the stateoftheart at each noise level considered. It is also interesting to notice that learningbased methods are more stable to noise than modelbased methods as their performance degrades more gracefully for increasing noise variance.
Clean  Noisy  DGCNN  APSS  RIMLS  AWLOP  MRPCA  GLR  PCN  GPDNet  GPDNet  

[31]  [11]  [20]  [15]  [19]  [32]  [22]  MSE  MSESP  
0.01  6.44  31.13  30.83  22.60  24.52  29.79  31.40  21.90  26.85  20.11  22.33 
0.015  6.44  32.77  32.52  31.83  37.35  32.17  39.97  25.99  27.54  21.16  24.46 
0.02  6.44  33.77  32.31  42.42  45.86  33.41  42.45  31.30  28.65  22.78  27.06 
Fig. 2 shows qualitative results at a medium noise level by presenting the denoised point cloud for each method. The surface distance of each point is visualized in the figure to understand the position of the denoised points with respect to the ground truth. The root mean square value of the surface distance (RMSD) can be computed as:
(7) 
It can be seen that on average both versions of our method provide lower pointssurface distance and that the shape of the reconstructed point cloud is more similar to the original one. Fig. 3 shows another qualitative comparison, displaying the unoriented normal estimation error for each denoised point. It can be seen that the proposed method, especially the version with only MSE, provides lower normal estimation errors, highlighting the higher quality of the denoised point cloud.
4.3 Ablation studies
We study the behavior of GPDNet in terms of a few design choices. In particular, we first investigate the impact of dynamic graph computation, i.e., updating the graph from the hidden feature space as in Fig. 1, as opposed to a fixed graph construction where neighbors are identified in the noisy 3D space and used for all graphconvolutional layers. Table 5 shows that dynamic graph update provides improved performance thanks to refined neighbor selection.
We also study the impact of neighborhood size on the overall performance. Selecting a larger number of neighbors for graph convolution increases the size of the receptive field and can help denoise smooth areas in the point cloud by capturing more context, at the price of losing some localization and increased computational complexity. This is related to results on image denoising [4], where it is known that the optimal size of the receptive field depends on the noise variance. Tables 6 and 7 show that increasing the number of neighbors is beneficial, up to a saturation point. We also see that the impact of a larger receptive field is more significant for the GPDNet MSESP variant.
GPDNet MSE  GPDNet MSESP  

Dynamic  Fixed  Dynamic  Fixed  
C2C ()  35.68  37.00  36.99  38.45 
UNAE (degrees)  23.56  23.75  26.29  26.65 
4NN  8NN  16NN  24NN  

GPDNet MSE  28.27  24.43  23.69  23.84  
GPDNet MSESP  30.38  25.54  24.31  24.44  
GPDNet MSE  40.46  35.68  36.09  36.67  
GPDNet MSESP  46.05  36.99  35.39  35.80  
GPDNet MSE  58.88  50.34  52.96  55.45  
GPDNet MSESP  64.63  51.82  49.26  50.43 
4NN  8NN  16NN  24NN  

GPDNet MSE  27.22  22.51  20.11  20.89  
GPDNet MSESP  29.04  24.10  22.33  22.16  
GPDNet MSE  28.03  23.56  21.16  21.18  
GPDNet MSESP  31.31  26.29  24.46  23.80  
GPDNet MSE  31.09  25.67  22.78  22.98  
GPDNet MSESP  32.00  28.81  27.06  26.92 
4.4 Feature analysis
We analyze the characteristics of the receptive field, i.e., the set of points whose feature vectors influence the features of a specific point, induced by the graph convolutional layers. In Fig. 4 we show an example of the receptive field of a single point for the output of the graph convolutional layers of a residual block with respect to the input of the residual block. The visualization is on the denoised point cloud. We observe that the receptive field is quite localized in the 3D space and its size increases as the number of layers increases. It is interesting to note that, since the graph is dynamically constructed in the feature space, the points of the receptive field are not just the spatially closest ones but they are also among the ones with similar shape characteristics. For example, in Fig. 4 the considered point is on the lower side of the chair stretcher and all the points of the receptive field belong to the same part of the surface.
In order to better analyze this nonlocal property of the receptive field we measure its radius in the 3D space and compare it to a fixed graph construction where the neighbors are determined by proximity in the noisy 3D space. Fig. 5 shows the radius of the receptive field of each point at the output of a residual block with respect to the input of the residual block. The radius is evaluated as the 90 percentile Euclidean distance in the 3D space on the clean point cloud (90 percentile is used since the maximum might be an unstable metric). It can be noticed that when using the dynamic graph construction the radius is only slightly larger in the first residual block but can be significantly larger in the second one. This can be interpreted as the feature space building and exploiting more and more nonlocal features with patterns similar to those in Fig. 4.
4.5 Structured noise
Noisy  PointCleanNet  GPDNet MSE  GPDNet MSESP 

0.1447  0.0966  0.0664  0.0602 
In order to check if the proposed architecture can generalize beyond white Gaussian noise, we train it on a simulated LiDAR dataset. We simulate scanning the Shapenet objects with a Velodyne HDL64E scanner using the Blensor software [10]
. Two sources of noise are considered for the acquisition process: a laser distance bias with Gaussian distribution and a perray Gaussian noise. We set both distributions to be zeromean and with a standard deviation equal to
of the longest side of the object bounding box. We also retrained PointCleanNet on the simulated data for comparison with a stateoftheart model. Table 8 shows that the results follow those on white Gaussian noise, with the proposed method improving over PointCleanNet. Note that RMSD is used as metric in place of the Chamfer measure since it is better suited to the case when points are not uniformly distributed.5 Conclusions
In this paper, we have presented a new graphconvolutional neural network targeted for point cloud denoising. Thanks to the graphconvolutional layers, the proposed architecture is fully convolutional and can learn hierarchies of features, showing a behaviour similar to standard CNNs. The experimental results show that the proposed method provides a significant improvement over stateoftheart techniques. In particular, the proposed method is robust to high level of noise and structured noise distributions, such as those observed in real LiDAR scans.
Acknowledgements.
This material is based upon work supported by Google Cloud.
References
 [1] (2003) Computing and rendering point set surfaces. IEEE Transactions on visualization and computer graphics 9 (1), pp. 3–15. Cited by: §1, §2.
 [2] (2010) L1sparse reconstruction of sharp point set surfaces. ACM Transactions on Graphics (TOG) 29 (5), pp. 135. Cited by: §2, §2.
 [3] (2017) Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine 34 (4), pp. 18–42. Cited by: §1.

[4]
(2012)
Image denoising: Can plain neural networks compete with BM3D?.
In
2012 IEEE conference on computer vision and pattern recognition
, pp. 2392–2399. Cited by: §4.3.  [5] (2005) Estimating differential quantities using polynomial fitting of osculating jets. Computer Aided Geometric Design 22 (2), pp. 121–146. Cited by: §1, §2.
 [6] (2015) ShapeNet: An InformationRich 3D Model Repository. Technical report Technical Report arXiv:1512.03012 [cs.GR], Stanford University — Princeton University — Toyota Technological Institute at Chicago. Cited by: §4.1.

[7]
(2008)
MeshLab: an OpenSource Mesh Processing Tool
. In Eurographics Italian Chapter Conference, V. Scarano, R. D. Chiara, and U. Erra (Eds.), External Links: ISBN 9783905673685, Document Cited by: §4.2.  [8] (2018) 3D Point Cloud Denoising via Bipartite Graph Approximation and Reweighted Graph Laplacian. arXiv preprint arXiv:1812.07711. Cited by: §2, §2.
 [9] (2019) 3D Point Cloud Denoising via Deep Neural Network Based Local Surface Estimation. In 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8553–8557. Cited by: §1, §2, §2.
 [10] (2011) BlenSor: Blender Sensor Simulation Toolbox. In Advances in Visual Computing, G. Bebis, R. Boyle, B. Parvin, D. Koracin, S. Wang, K. Kyungnam, B. Benes, K. Moreland, C. Borst, S. DiVerdi, C. YiJen, and J. Ming (Eds.), Berlin, Heidelberg, pp. 199–208. External Links: ISBN 9783642240317 Cited by: §4.5.
 [11] (2007) Algebraic point set surfaces. In ACM Transactions on Graphics (TOG), Vol. 26, pp. 23. Cited by: §1, §2, §4.2, Table 1, Table 2, Table 3, Table 4.
 [12] (2017) Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems, pp. 1024–1034. Cited by: §3.2.
 [13] (2017) A review of algorithms for filtering the 3d point cloud. Signal Processing: Image Communication 57, pp. 103–112. Cited by: §2.

[14]
(2019)
Total Denoising: Unsupervised Learning of 3D Point Cloud Cleaning
. arXiv preprint arXiv:1904.07615. Cited by: §2, §2, §3.  [15] (2013) Edgeaware point set resampling. ACM transactions on graphics (TOG) 32 (1), pp. 9. Cited by: §1, §2, §4.2, Table 1, Table 2, Table 3, Table 4.
 [16] (2007) Parameterizationfree projection for geometry reconstruction. In ACM Transactions on Graphics (TOG), Vol. 26, pp. 22. Cited by: §1, §2.

[17]
(2018)
Deformable shape completion with graph convolutional autoencoders
. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1886–1895. Cited by: §1.  [18] (2018) Nonlocal recurrent network for image restoration. In Advances in Neural Information Processing Systems, pp. 1673–1682. Cited by: §1.
 [19] (2017) Point cloud denoising via moving RPCA. In Computer Graphics Forum, Vol. 36, pp. 123–137. Cited by: §2, §2, §4.2, Table 1, Table 2, Table 3, Table 4.
 [20] (2009) Feature preserving point set surfaces based on nonlinear kernel regression. In Computer Graphics Forum, Vol. 28, pp. 493–501. Cited by: §1, §2, §4.2, Table 1, Table 2, Table 3, Table 4.
 [21] (2017) Pointnet: Deep learning on point sets for 3D classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660. Cited by: §1, §2.

[22]
(2019)
POINTCLEANNET: Learning to Denoise and Remove Outliers from Dense Point Clouds
. In Computer Graphics Forum, Cited by: §1, §2, §2, §3.2, §3.3, §4.2, §4.2, Table 1, Table 2, Table 3, Table 4.  [23] (2018) Pointpronets: consolidation of point clouds with convolutional neural networks. In Computer Graphics Forum, Vol. 37, pp. 87–99. Cited by: §2, §2.
 [24] (2015) Graphbased denoising for timevarying point clouds. In 2015 3DTVConference: The True VisionCapture, Transmission and Display of 3D Video (3DTVCON), pp. 1–4. Cited by: §2, §2.

[25]
(2013)
The emerging field of signal processing on graphs: extending highdimensional data analysis to networks and other irregular domains
. IEEE signal processing magazine 30 (3), pp. 83–98. Cited by: §2.  [26] (201707) Dynamic EdgeConditioned Filters in Convolutional Neural Networks on Graphs. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vol. , pp. 29–38. Cited by: §1, §3.2.
 [27] (2015) Denoising point sets via l0 minimization. Computer Aided Geometric Design 35, pp. 2–15. Cited by: §2, §2.
 [28] (2019) Deep GraphConvolutional Image Denoising. arXiv preprint arXiv:1907.08448. Cited by: §3.2.
 [29] (2019) Learning Localized Generative Models for 3D Point Clouds via Graph Convolution. In International Conference on Learning Representations (ICLR) 2019, Cited by: §1, §3.1.
 [30] (2018) Feastnet: Featuresteered graph convolutions for 3d shape analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2598–2606. Cited by: §3.2.
 [31] (2019) Dynamic graph cnn for learning on point clouds. ACM Transactions on Graphics (TOG) 38 (5), pp. 146. Cited by: §1, §2, §3.1, §3.2, §4.2, Table 1, Table 2, Table 3, Table 4.
 [32] (2018) 3D point cloud denoising using graph Laplacian regularization of a low dimensional manifold model. arXiv preprint arXiv:1803.07252. Cited by: §2, §2, §4.2, Table 1, Table 2, Table 3, Table 4.
 [33] (201707) Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising. IEEE Transactions on Image Processing 26 (7), pp. 3142–3155. External Links: Document Cited by: §1, §3.1.