Recently, due to the advances in 3D acquisition technologies, point-based representations of complex objects and environments cloud be captured in many applications such as the field of autonomous driving and robot manipulation. With the proliferate available data, the analysis of raw 3D point clouds has been paid great attention, and a significant problem in shape analysis is to robustly estimate the normals from a raw, unordered point cloud, aimed to handle the challenging difficulties of sampling density, noise, outliers. The precise normal estimation can be used as extra information for improving some computer vision tasks, such as reconstructionM2013Poisson, registrationPomerleau2015A and object segmentationGrilli2017a problems.
The common approach is to sample local neighbors around each point and to estimate a point cloud normal based on the local points statistics by the fitting local surface. However, the robust computation of normals is always affected by many issues. Due to lack of connectivity or structure information, raw point clouds normal estimation always faces challenges, especially when the noisy, incomplete, and typically exhibit varying sampling density occur in real scan models. The classic methods are often sensitive to the scale selection and unstable to the real scan noises. Moreover, many man-made objects exhibit sharp features like corners or edges, which are easily lost. Although some learning-based methods are proposed in recent years, this kind of method is known to often achieve far better results compared to data-independent methods. However, deep learning methods roughly select local points as input and just train a model to regress the normal of each point commonly. They do not consider the outliers and the error points on the patches due to the different sample scale, either. Here we define the error points as the points that may not belong to the local fitting plane. In particular, with the increase of sampling radius or point cloud noise, the accuracy of deep learning methods are increasingly influenced by input. The SOTA methods proposed always do not consider that the input patches may include lots of error points. Intuitively, for a normal estimation network, if we have a stronger constraint on point-wise features of the input patch, this means that the constraint can better distinguish the point-wise features belonging to the fitting plane or non-plane points, we would obtain more accurate normals in any sampling scale.
Our key insight is that local shape properties can be robustly estimated by suitably accounting for shape features, noise margin, and sampling distributions. However, such a relation is complex and difficult to manually account for. Hence, we propose a data-driven approach based on local point neighborhoods similar to PCPNETguerrero2018pcpnet, which trains a point network to directly learn local properties using ground truth reference results under different input perturbations. But different from their architecture we consider producing a stronger constraint to the network to ensure that the more meaningful and plane-aware features can be obtained and we also use a scale selection method to ensure the right scale can be achieved. In this paper, we first proposed a Local Plane Features Constraint (LPFC) to better distinguish the features differences between these two kinds of points in each patch, and this plane-aware features enhancement item can improve the results of normal estimation. Especially, when the large noise occurs and for the regularity objects, our method can give an impressive estimation result. Also, our multi-scale strategy can guide the network to select the best scale for point estimation. By scale weight selected, our method can obtain more accuracy normal results.
The main contributions of this paper are:
For single-scale normal estimation, Local Plane Features Constraint (LPFC) is used in our networks to ensure that the normal estimation network more robust to the noise point cloud in any sampling scale. Besides, the binary classifier used in our LPFC can well obtain the main part of the patch which is almost in the fitting plane and distinguish the error points, especially when the sampling scale is large.
A scale selection strategy for scale prediction is employed in our method. In this paper, we propose a novel scale estimation network, which is used to select the most suitable scale of each point through a joint analysis of multi-scale features extracted from single-scale networks.
The experiment shows outperformance results in single-scale and multi-scale compared some state-of-the-art surface normal estimators.
2 Related Work
Normal estimation has a very long history in geometry processing, motivated in large part by its direct utility in shape reconstruction. In this section, we present an overview of traditional normal estimation methods and learning-based methods for normal estimation. Here, we first give a brief description of the 3D point deep learning history.
2.1 Deep learning for 3D point clouds
The point cloud representation is challenging for deep learning methods because it is both unstructured and point-wise unordered. Early, the voxel-based shape representations maturana2015voxnet; wu20153d; qi2016volumetric
are adopted to direct extension of 2D image grids to 3D grids, which are the pioneers applying 3D convolutional neural networks. However, the volumetric representation is constrained by its resolution, and the local geometric information maybe lost. So this leaning-based representation methods can not be used to predict local property well. Recently, there has been quite a few developments in feature learning from 3D point cloud directly. The pioneers of the point cloud networks are PointNetqi2017pointnet, PointNet++qi2017pointnet++ and PointCNNli2018pointcnn. All the above-mentioned methods are used to learn global features for 3D object detection, classification and retrieval tasks. Specially, The PointNet applies a symmetric, order-insensitive, function on a high-dimensional representation of individual points. This network can be used to extract geometric features directly from the original data. In this paper, we also employ the a architecture similar to PointNet qi2017pointnet to directly extract plane-aware features of the local points and to regress the normals of the 3D model in the end.
2.2 Traditional normal estimation
Early works on normal estimation are based on Principal Component Analysis (PCA)hoppe1992surfaceklasing2009comparison is also used for normal estimation. They both specify the neighbors within some scale, and then uses PCA or SVD method to estimate a tangent plane. The performance of these approaches usually dependent on the scale of the selected patches and the noise, outliers may also have some influence for the normal estimation. An obvious observation is that if the sampled patch contains boundaries or consisted of one more fitting planes, the final normal estimation will get worse. Several methods have been proposed to address these limitations by both designing more robust estimation procedures, capable of handling more challenging data, and by proposing techniques for estimating normal orientation. Then assigning Gaussian weights to the neighbors methodPauly2003Shape or adapting the radius sampling methodMitra2004ESTIMATING are proposed to solve these common issues. Then Yoon et al. Yoon2007Surface consider to assembling statistics techniques to improve the classic PCA method, then a more stable results can be estimated. Besides, some other approaches alexa2001point; cazals2005estimating; guennebaud2007algebraic fitting higher-level surfaces like local spherical and quadrics surfaces. These methods usually choose a large-scale neighborhood, leading them to smooth sharp features and also to fail in estimating normals near edges and do not address another major issue of real-world point clouds. Another kind of approaches mainly relies on using Voronoi cells of point cloudsAmenta1998Surface; Merigot2011Voronoi; Dey2004Provable. Although these methods can improve the normal estimation on the sharp points, the estimation results are sensitive to the noise. To handle this difficult problem, Alliez et al. Alliez2007Voronoi proposed a PCA-Voronoi method, which provides some control smoothness by grouping adjacent cells on the 3D model.
While many of these methods hold theoretical guarantees on approximation and robustness, in practice all above methods require a careful setting of parameters, and often depend on special treatment in the presence of strong or structured noise. Unfortunately, there is no universal parameter set that would work for all settings and shape types. So the data-driven methods are urgently necessary technologies for more robust to estimate the normals of 3D models.
2.3 Learning-based normal estimation
Deep learning based approaches also found their way into surface normal estimation with the recent success of deep learning in a wide range of domains. Boulch et al. Boulch2016Deep are the pioneers to apply Convolutional Neural Network to regress the point normal, they proposed to use a CNN on Hough transformed point clouds in order to find surface planes of the point cloud in Hough space. But this method is based on image input and do not use the point data directly. Recently, due to the advent of graph neural networks and geometric deep learningBronstein2017Geometric, then Charles et al. qi2017pointnet proposed PointNet to directly learn features from points data. Inspired by PointNet, Guerrero et al. guerrero2018pcpnet proposed a deep multi-scale architecture for surface normal estimation, the different from PointNet, they use patch points data as input and shows that a Quaternion transformation is more useful to the normal estimation task and also show that mean pooling feature is better. However, their method do not do well in the scale selection due to simply concat multi-scale features of the network. Later, Ben-Shabat et al. ben2019nesti proposed to use a mixture- of-experts architecture, which relies on a data- driven approach for selecting the optimal scale around each point and encourages sub-network specialization. However, due to use fisher vectors as input representation, their network can not obtain well results in a single scale, more information maybe lose in the process of computing point statistics only on a coarse Gaussian grid. In addition, we find that even though the results of the final normal estimation can be improved by multi-scale joint decision making, the trained network itself cannot obtain the best results in any sampled scale. The single-scale sampled patch itself always has error points as described above. All above proposed methods do not consider this common situation in the stage of neighbor sampling. In this paper, we use a Local Plane Feature Constraint to obtain a plane-aware point-wise features in a patch, so the final global features are more robust to the error points and we also can used the subnetwork to obtain a more clean patch which only contain points on the main plane. Finally we also employ a multi-scale selection method to adaptive chose the best scale on each point.
Our goal in this work is to estimate point normals from a point cloud utilizing features constraint that enhances regularity and scale selection strategy. Similar to PCPNetguerrero2018pcpnet, given a 3D point cloud , the input of our network are the local patches of this point cloud , centered at points with a fixed radius (scale) proportional to the point cloud’s bounding box extent, namely, . For the single scale network, our architecture consists of two main parts. First, given sampled neighbors with fixed scale, we mainly learn plane-aware features on each PCPNetguerrero2018pcpnet is that we proposed a new branch in our network to ensure that our network can extract the plane-aware features. Finally, a multi-task loss is used to obtain an important region of input and estimate the local normal meanwhile. For multi-scale input, we also consider selecting the most appropriate scale for each point by employing a scale selection network. Figure 1 and Figure 2 show the single-scale feature constraint and multi-scale selection networks respectively. More details are described in the following subsections.
Similar to PCPNet guerrero2018pcpnet, for each sampled local patch at the centered point , we should first translate the patch into a local frame relative to the centroid point and normalize its size to one scale: for and , where is the coordinate of the centroid of patch , and
is the fixed scale radius. Our network takes a fixed number of points as input. Patches that have too few points are padded with replicate points of the patch, we think that process can keep the balance of points better compared to just padding zeros points, and we pick a random subset for patches with too many points.
3.2 Plane-aware Features Constraint
Given an input patch centered at point , different from the previous methods that regress the local normal directly. We consider applying a features constraint strategy for extracting plane-aware features to estimate the point normal more stably. During the training stage, we know that each patch has a real normal at center point as ground truth, the goal is to train a network that can regress the predict normal to the real normals as close as possible. In our work, we consider to give stronger constrains to the normal estimation network at the point-wise level instead of at the patch level. To be specific, for training, we need to obtain real normal not only at the center point but also at other points in the neighbors. The real normals on the local patch are denoted as , where is the number of points in a patch. We then calculate the distance between each point normal to the center normal, we use this distance to measure the error distance that each point to the real plane.
here denotes the real normal of the center point and the length of is . Intuitively, on one local point cloud, if the normal on one point is consistent with the central point, generally we assume that the point is on the fitting plane and such a point is more important. Then we can normalize the error value of each patch as:
We use the threshold as the default value to choose a main part in each patch as the ground truth and the the Main Plane Estimation can be consider as a classification problem to predict the plane denoted as . We define the ground truth of the plane as:
here means that the point is a plane point. For small scale, we need change this threshold value to weaken the plane constraint, and we use 0.8 for small scales (0.03). We consider learning point-wise features by classifying the input patch points as two parts, namely the main plane part and error part. Constraining the features learning process in our network, we can obtain more meaningful features to regress the normal of one patch. This binary classification constraints can improve the point network to extract regular features contained more plane information of the patch. We call this constraint strategy as Local Plane Features Constraint (LPFC). Figure3 shows some estimation results via our network and we can see that the constraint is meaningful for the normal estimation.
3.3 Single Scale Network
Our network follows the PointNet architecture and this is also similar to PCPNetguerrero2018pcpnet but we also propose a new plane estimation subnetwork to constrain the point-wise features learned, then the plane-aware features are aggregated to global features of the local patch to regress the local normal. An overview of the architecture is shown in Figure 1.
Local Plane Features Constraint.
In the part of point-wise features constraint network, we also use some layers of perceptrons to a classifier. The point-wise features can be constrained, some points implicit main plane information and some others have the error plane information. Through the back propagation optimization algorithm, the features of each point may contain more information related to the normal fitting plane. The subnetwork is also shown in Figure1.
Normal Estimation. For the normal estimation part of the network, we also use Quaternion spatial transformer to transfer the input patch to a canonical pose. Here a subnetwork is used to evaluate the quaternion that parameters a local spatial transform. The output of Quaternion can be converted to a rotation matrix. Then the rotated patch is feed into the normal estimation network. One important property of the network is that it should be invariant to the input point ordering. Qi et al.qi2017pointnet show that this can be achieved by applying a set of functions with shared parameters to each point separately and then combine the resulting values for each point using a symmetric operation. In our work, we use the weight mean pooling function to obtain the patch feature provides a rich description of the patch, and the global feature is also used in the main plane estimation to constrain the global and point-wise features. We use a three-layer fully connected network to perform the normal regression. As shown in the top part of Figure 1.
The Multi-task Loss. The network then diverges into two different branches performing two tasks: predicting the center normal of the input patch and constraining their point-wise features to better select the main part in one patch. The loss of our network is the sum of the losses of its two branches,
The normal prediction loss performed by minimizing the Euclidean distance between estimated normals and ground truth normals as usual.
here is the total number of the point cloud. We also employ a discriminative function to present constraint the patch point-wise features learning and this loss can improve the robustness of the network for both point-wise feature learning and also obtain an effective global feature to estimate the final normal. They can be seen as a binary classification problem which can distinguish the fitting plane points and other error points in one patch and this loss is defined by the cross-entropy loss :
3.4 Multi-Scale Normal Estimation
As shown in Figure 2. We give the pipeline of our multi-scale architecture to adaptively select best normal from single scale results. In our method, the normal is finally estimated using separate normal estimation networks. Each is a normal estimation network that has been introduced in the previous section, the single scale network estimates the patch normal at the center point and also estimates the planer points of the patch. The multi-scale adaptive selection network uses subnetworks from small scale to large scale. As shown in the third part of Figure 2, each scale network outputs a three-element vector . The plane estimation constraints are also used in each scale and are given. The global features obtained from subnetworks are concatenated as one vector and this vector are feed to the Scale Estimation Network to obtain the weights of the S scale normals denoted as . The final predicted normal (for center point) is , i.e. the normal associated with the subnetwork expected to give the best results. For the multi-scale network, we also train the network to minimize the difference between a predicted normal and a ground truth normal . Finally, we minimize the total loss:
here, is the loss used for normal estimation in single scale . Using this loss, each scale normal estimation network is rewarded for specializing in a specific input type. Note that during training, all the scale normal vectors are predicted and used to compute the loss and derivatives. However, at test time, we compute only one normal, which is associated with the maximal .
4 Evaluation and Discussion
4.1 PCPNet dataset
Our method is trained and validated quantitatively on the PCPNet dataset as provided by Guerrero et al. guerrero2018pcpnet. It contains a mixture of point clouds sampled from man-made objects and high-resolution scan models such as bunny. Each point cloud consists of 100k points. We reproduce the experimental setup of guerrero2018pcpnet; ben2019nesti, training on the provided split containing 32 point clouds under different levels of noise. The test set consists of six categories models which contain four sets with no noise, small noise (), medium noise () and high noise () and two sets with different sampling density (striped pattern and gradient pattern). Then, the Root Mean Squared Error (RMSE) on the provided test set is used as performance metric following the related works, where the RMSE is first computed for each test point cloud before the results are averaged over all point clouds in one category.
4.2 Training details
The variants of our network are trained by using all patches sampled from 32 shapes, which includes 8 point cloud models with 4 levels of gaussian noises. For each sample point, we randomly extract 500 neighboring points enclosed within a sphere of radius. We train our single-scale network with a patch size of 0.01, 0.03 and 0.05 respectively. Finally, we also use a learned multi-scale selection approach to refine the normal results based on the multiple scales’ results together. If the neighborhoods within the radius have more than 500 points, we perform random sampling from the local subset points and for those with fewer points within the patch radius, we just repeatedly and uniformly sample points from the local subset. In addition, we train our network by used Pytortchpaszke2017automatic on a single 1080 ti GPU.
Similar to PCPNetguerrero2018pcpnet
, we train our networks (single-scale and multi-scale) for up to 2000 epochs until convergence on the PCPNet dataset. A full randomization of the dataset, mixing patches of different shapes in each batch, was vital to achieve stable convergence. All our training was performed by using stochastic gradient descent with batch size 64 in the single scale training stage and the batch size is set to 16 in the multi-scale training stage. The learning rate isand momentum is set to . In the stage of evaluation, we estimate the unoriented normals of all points on the models and the error compared to ground truth are calculated for evaluation.
As discussed above, to compare our method to other deep learning methods and some classic geometric methods, we use angle difference between our predicted normal and ground truth via MSER metric. In the experiment, we compare the improvement of our method on a single scale by local plane feature constraints and also show excellent performance on multiple scales.
Table 1 shows the comparison of unoriented normal estimation using single scale methods discussed above. In the top row of the table, we show the results for varying levels of noise, from zero noise to high noise. The two rows in the middle show the results for point clouds with a non-uniform sampling rate. In each of the categories, we show the average for all shapes in one category. The last row shows the global average error over all shapes. As shown in Table 1, we slightly improve on the state of the art on most single scale levels with the different noise levels and varying densities. As shown in Table 1, with the noise added, our method can improve the normal results obviously. The reason for the promotion is that the plane-based classify can improve our network to extract more robust features that are more correlated with the local normal and can better remove the meaningless disturbance produced by the noise. In Table1, we can see when the patch scale is selected by 0.03 and 0.05, our method obtains slightly better results. In addition, in the case of less noise or no noise, we can also achieve competitive results compared to the method of PCPNet. The second and third row of Figure 4 also show the improvements compared to single scale results of PCPNetguerrero2018pcpnet.
Besides, we remove the organism objects from the test data of the PCPNet dataset shown in Figure 5. Here, we remove five models (right models). We find that our method can improve the normal estimation of man-made objects, Table 2 shows the results without the organisms, and on each level of noise level, our method is always can improve the final estimation on each scale. We can give more robust results compared to simple regress the patch normals. The point-wise feature constraints really can improve the results via extract more robust features. Our method is better for preserving plane information, especially when models have higher noise. But as shown in Table 1, on the whole dataset, our method has no obvious performance on small scale (0.01), it is the possibility that false plane is rare in small scale and our method could give more improvements for large scale sampled patch.
Then, we also show the outperformed results of multi-scale normals estimation. Show in Table 3. We can observe the following general trends in these results: first, note that all of our methods consistently outperform competing techniques across all noise levels and compared to single scale results, the multi-scale selection methods can improve our results based on any single scale shown in Table1. It can be seen that our method outperforms almost other methods across all noise levels and most density variations. Especially compared to the PCPNetguerrero2018pcpnet, which is the based network we used, our approach can be greatly improved the normal estimation results compared to simple concatenation strategy used in PCPNetguerrero2018pcpnet. Qualitative comparisons of the normal error on four shapes of our dataset are shown in Figure6. Note that for classical surface fitting, small patch sizes work well on detailed structures like in the bottom row, but fail on noisy point clouds, while large patches are more tolerant to noise, but smooth out surface detail. Our method can obtain more consistent results compared to classic methods. In addition, compared to PCPNetguerrero2018pcpnet, our multi-scale normal estimation method performs well. Show in Figure 6.
4.4 Plane Constraints Results
In our experiments, we also show the results of our local plane classification. As shown in Figure 3, from the top to down, we show the results for varying levels of noise, from zero noise to high noise. Our network can accurately distinguish between the set of points on the flat plane where the center point is and other sets of points. The robust results are achieved with different noise levels. In addition, we can see when a patch has a sharp edge, our network can divide the set of points into two categories and shows a clear boundary. This constraint can really improve the stage of feature learning of the input patch and we can obtain a plane-aware feature to estimate the normal of any scale patch. Figure 4 shows that our LPFC can improve the single scale results both in 0.03 and 0.05 scale on the left test models shown in Figure 5. But as shown in Table 2, On the whole dataset, our method has no obvious performance on small scale (0.03), it is the possibility that error part points are rare in small scale.
4.5 Scale selection performance
As shown in Table 1, with noise increasing, our methods can improve the normal estimation on the single scale architecture. The final evaluation results from different noise levels are affected by the choice of different scales. This means that the choice of scale has an important impact on the normal estimation. Besides, the first row of Figure 4 also shows that our scale selection method can greatly improve the estimated results compared to the single scale results as shown in the second and fourth row of Figure 4. The final errors in Table 3 also show that the scale selection strategy can greatly improve our results. In this section, we would like to show how the scales affect the final estimated normal. Shown in Figure 7, from left to right, the labels of scale selected in different noise levels are given. Three colors (blue, yellow and red) correspond to the scales respectively. First, our multi-scale network tends to choose small-scale results in the case of low-level noise, because the point network can obtain accurate information from small local patches which may include more relevant information. When the part has rich local details, medium and small scale results are selected via our network, as shown in the example marked by dotted lines (1-2 row in the first column). For the boundary part, small scale is generally chosen, the reason is that the small neighbors can better judge the surface where the current center point is. Show in 3-4 rows of Figure 7. Then, with the noise increasing, our network tends to select the medium and large scales to ensure that more information can be obtained to avoid that the errors (noise and outliers) are introduced. The first two columns of Table 3 also give the consistent result that large scale sampling can improve the normal estimation when the noise is increased in models. As shown in Table 3, when the multi scales are chosen from 0.03, 0.05 and 0.07, the normals estimated at high noise level will be slightly improved via multi-scale selection method. Furthermore, we compared the visual results from different multi-scale combinations. As shown in Figure 8, we use the scales 0.03, 0.05 and 0.07 as the input of our networks. Compared to 7, When the minimum scale (0.03) is removed, the network loses the ability to select on the small scales, thus weakening its performance on the small noise models. However, the combination of larger-scales selection has a better performance in the large noise model due to the introduction of a larger scale. The same conclusion is also given in the table 3. Finally, our proposed multi-scale network would like to choose a larger scale in the sparse region as shown in the last column of Figure 7.
Besides, the distribution of points scale selection is also given using a histogram. Show in Figure 7 and 8. It can be seen from the histogram distributions of the selected points that for the PCPNet database, making more choices near the small scales is beneficial to the final estimation result. However, due to the limitation of our equipment, the comparison of more scales selection results was not given.
In this work, we propose a normal estimation method considered to use local plane features constraint strategy and select the best one from multi-scale results. We can improve the single scale results through the plane feature constraint mechanism, especially when the input of the network is a large scale patch. The local plane features constraint strategy can reduce the error caused by sampling error plane points. Besides, the multi-scale selection network enables the prediction of an optimal local scale and our experiments have analyzed the relationship between scale selection and point cloud normal estimation. Noise and local structure may both affect the best scale for normal estimation. The proposed method achieves state-of-the-art results relative to all other methods and demonstrates robustness to noise.
We would like to thank all the reviewers for their valuable comments and feedback. This work was supported by NSFC (Nos. 61702079), NSFC(Nos. 61632006) and NSFC(Nos. 61562062).