DeepAI
Log In Sign Up

NormalNet: Learning based Guided Normal Filtering for Mesh Denoising

03/10/2019
by   Wenbo Zhao, et al.
0

Mesh denoising is a critical technology in geometry processing, which aims to recover high-fidelity 3D mesh models of objects from noise-corrupted versions. In this work, we propose a deep learning based face normal filtering scheme for mesh denoising, called NormalNet. Different from natural images, for mesh, it is difficult to collect enough examples to build a robust end-to-end training scheme for deep networks. To remedy this problem, we propose an iterative framework to generate enough face-normal pairs, based on which a convolutional neural networks (CNNs) based scheme is designed for guidance normal learning. Moreover, to facilitate the 3D convolution operation in CNNs, for each face in mesh, we propose a voxelization strategy to transform irregular local mesh structure into regular 4D-array form. Finally, guided normal filtering is performed to obtain filtered face normals, according to which denoised positions of vertices are derived. Compared to the state-of-the-art works, the proposed scheme can generate accurate guidance normals and remove noise effectively while preserving original features and avoiding pseudo-features.

READ FULL TEXT VIEW PDF

page 2

page 4

page 5

page 7

page 8

page 9

page 10

11/14/2017

Robust and High Fidelity Mesh Denoising

This paper presents a simple and effective two-stage mesh denoising algo...
06/28/2020

DNF-Net: a Deep Normal Filtering Network for Mesh Denoising

This paper presents a deep normal filtering network, called DNF-Net, for...
11/24/2021

Fast mesh denoising with data driven normal filtering using deep variational autoencoders

Recent advances in 3D scanning technology have enabled the deployment of...
12/10/2017

Static/Dynamic Filtering for Mesh Geometry

The joint bilateral filter, which enables feature-preserving signal smoo...
07/02/2020

Surface Denoising based on Normal Filtering in a Robust Statistics Framework

During a surface acquisition process using 3D scanners, noise is inevita...
09/02/2022

Geometric and Learning-based Mesh Denoising: A Comprehensive Survey

Mesh denoising is a fundamental problem in digital geometry processing. ...
12/21/2019

Anisotropic Mesh Filtering by Homogeneous MLS Fitting

In this paper we present a novel geometric filter, a homogeneous moving ...

1 Introduction

Recently, the demand of high-fidelity 3D mesh models of real objects has appeared in many domains, such as computer graphics, geometric modeling, computer-aided design and movie industry. However, due to the accuracy limitation of scanning devices, raw mesh models are inevitably contaminated by noise, leading to corrupted features that profoundly affect the subsequent applications of meshes. Hence, mesh denoising has become an active research topic in the area of geometry processing.

Mesh denoising is an ill-posed inverse problem. The nature of mesh denoising is to smooth a noisy surface, concurrently preserving the real object features, while without introducing unnatural geometric distortions. Mesh denoising is a challenging task, especially for cases involving large and dense meshes, and high noise levels. The key to the success of mesh denoising is to differentiate the actual geometry features, such as localized curvature changes and small scale details, and noise generated by scanners. In the literature, there are rich works on mesh denoising, such as filtering based [1, 2]

, feature-extraction based 

[3, 4], optimization-based [5, 6], similarity-based [7, 8, 9] and so on. Among them, the two-stage approach becomes popular in recent years, where a smoothed face normal is first derived and then the vertices positions are updated so as to integrate this normal with respect to some objective function  [10, 11, 12, 13]. This approach treats mesh denoising by either introducing some kind of priors, such as L1-norm sparsity [12], or taking advantage of the redundant information from the corrupted mesh itself [10, 11]. However, since mesh has a variety of irregular structures, a certain prior assumption cannot guarantee to be always true. The inaccurate information from noisy mesh also limits the performance of denoising. In these cases, as shown in our experimental comparisons, even state-of-the-art denoising approaches cannot produce satisfactory results.

In the counterpart 2D image denoising, deep learning based strategies have been widely applied and achieved great success, such as [14, 15, 16]. However, back to mesh denoising, to the best of our knowledge, there are no works following this line. Two main difficulties prevent the usage of convolutional neural networks (CNNs) in mesh denoising:

  • Meshes are with complex connectivity among vertices. It is challenging to collect enough mesh examples to build an effective end-to-end learning model and avoid over-fitting at the same time.

  • In contrast to the regular grid structure of 2D images, meshes are with irregular topology structures. It is not straightforward to apply the regular 3D convolutional kernel in CNNs on mesh.

In this work, we propose a deep learning based face normal filtering scheme, called NormalNet, which is the first work in the literature using CNNs for mesh denoising. NormalNet is tailored to overcome the above difficulties:

  • For the first problem, instead of the modern end-to-end architecture, we follow the two-step framework in  [10], which firstly generates the guidance normal and then updates the positions of vertices. In our method, the guidance normal is derived from the proposed CNN-based learning model.

  • For the second problem, a voxelization strategy is proposed to convert irregular local mesh structure into regular 4D-array form, which can then be processed by convolution operation.

Fig. 1: The framework of the proposed NormalNet. Modules in all iterations share the same workflow: for a face, its local 3D structure is converted into regular 4D form by the process of voxelization, which is then inputted to the CNN-based learning model to derived the guidance normal. Guided normal filtering and vertex positions update are performed to get the denoised mesh.

In NormalNet, the ground truth normals are regarded as the target guidance normals in training. Since mesh structures change largely during filtering-based denoising, it is difficult to obtain a uniform CNN that works well in every iteration. Instead, NormalNet contains multiple CNNs corresponding to multiple iterations. To generate the training set for a specific iteration, we collect several meshes, add noise and apply filtering on them, then select a number of faces from the noisy meshes as the training set. In order to ensure the training set contains enough feature structure, faces are divided into several classes according to shape features, and are selected equably from classes with different structures to make up the training set. Since it is difficult to get the ground truth of really scanned meshes, we only train our network on synthesized examples and show by experiments that our model is general and works well on both synthesized and really scanned meshes. Compared to the state-of-the-art schemes, the proposed scheme can generate accurate guidance normals and remove noise effectively while preserving original features and avoiding pseudo-features.

The rest of this paper is organized as follows: in the following section we briefly summarize related works; the proposed scheme is introduced in Section 3; In Section 4, the training of NormalNet is elaborated; Experimental results are presented in Section 5; Section 6 concludes the paper.

2 Related works

In this section, we briefly review related works on filtering-based mesh denoising and neural networks based 3D model processing.

2.1 Filtering-based Mesh Denoising

Owe to the edge-preserving property of bilateral filter, researchers have made many attempts to adopt bilateral filtering in mesh denoising [17, 18, 19]

. Nevertheless, the photometric weight in bilateral filter cannot be estimated accurately from the noise-corrupted mesh. Afterwards, the joint bilateral filter 

[20] was proposed to improve the capability of bilateral filtering, in which the photometric weight is computed from the reliable guidance image. Inspired by this idea, Zhang [10] succeed in applying joint bilateral filtering in mesh denoising, in which the guidance information is obtained as the average normal of a local patch. This scheme works well in feature preserving, but cannot achieve satisfactory results in regions with complex shapes and sometimes introduces pseudo-features. To overcome the limitations of  [10], in a subsequent work [11], the guidance normals are computed by the corner-aware neighborhood, which is adaptive to the shapes of corners and edges.

Recently, there have been increasing efforts to exploit geometric attributes for mesh denoising. In  [21], the normal filtering was done by total variation (TV) based denoising scheme, which assumes the normal change satisfies piecewise constant. Wei et al. [2] proposed to cluster faces into piecewise smooth patches, and refine face normals with the help of vertex normal fields. Inspired by [5], Lu et al. [12] applied an additional vertex filtering before the L1-median face normal filtering, which is proved to be competent to high noise level and noise distributed in random direction. In [13], Tukey bi-weight similarity function was proposed to replace the similarity function in the bilateral weights computation; in addition, an edge weighted Laplace operator was introduced for vertex updating to reduce face normal flips.

It is worth noting that, to the best of our knowledge, there are no works using deep neural networks for filtering-based mesh denoising. In [22], Wang et al. proposed to model a regression function with neural networks. However, these networks are not convolutional ones, and just with three layers.

2.2 Neural Networks based 3D Model Processing

Driven by the great success of deep learning in image processing, researchers in graphics are also trying to employ deep neural networks for 3D model processing. However, due to the property of irregular connectivity, it is still a challenging issue to process 3D models with neural networks. Numerous works focus on transforming 3D model into regular data. For instance, in [23, 24], 3D models were represented by 2D rendered images and panoramic view respectively. Besides, some works [25, 26, 27, 28] employed voxelization on models to transform them into regular 3D data. Moreover, in [29, 30, 31, 32], meshes were represented in spectral or spatial domain for further processing.

In addition to these transform-based techniques, directly running neural networks on irregular data has also been extensively studied. PointNet [33] was one of the first network architectures that can handle point cloud data. Subsequently, in [34], the pointwise convolutional operator was proposed, which can extract features from each point. In [35], a kd-tree was constructed on point cloud, which is further used as the input of neural networks. Similar idea also showed in [36], in which the points were organized by octree. However, these schemes are designed and validated for point cloud data. There are no attempts for mesh denoising.

3 the Proposed NormalNet

In this section, we elaborate the proposed learning based guided normal filtering for mesh denoising.

3.1 Overview

Fig. 2: Illustration of the steps in voxelization. For a face in mesh, a 2-ring patch is built around it. Two matrices that represent rotation and translation are computed for normalization. The irregular 3D structure around this face is split into small cubes. A label is then assigned to each cube, which is the average normal of the faces contained by this cube.

We first overview our proposed learning based scheme for mesh denoising. As illustrated in Fig. 1, the proposed method includes a series of iteratively performed modules. Since mesh structures change largely during iterations, it is difficult to obtain a uniform CNN that works well for all iterations. Another choice is to train a CNN for each iteration. However, this strategy would introduce expensive computation burden that is unbearable. Instead, we propose to train multiple CNN models adapted to noise levels for iterations. Specifically, considering in the first iterations the structure changes between successive iterations are remarkably large, we train CNN models for each iteration respectively. For the rest iterations, a single CNN model is used for the corresponding modules. The thresholding iteration number is determined by experiments.

All modules share the same workflow: firstly, to facilitate the convolution operation, for a face in the mesh, we propose a voxelization strategy to transform irregular local mesh structure around into regular 4D-array form. The output of voxelization is then inputted to the proposed CNN-based learning scheme to estimate accurate guidance normal. Finally, the guided normal filtering is applied to derive the filtered face normal :

(1)

where and are the center and normal of ; is the geometrical neighboring faces of selected using the method in [10]; is the area of ; and are the center and the guidance normal of ; is a normalization factor to ensure that

is a unit vector;

and are the Gaussian kernels [37], which are computed by:

(2)
(3)

where and are the Gaussian function parameters. The positions of vertices are updated according to the filtered normal , following the idea of [38]. The face normal filtering and the positions update of vertices are iteratively repeated for and times respectively, and we obtain the denoised mesh finally.

3.2 Voxelization of Mesh

The key to the success of our CNN-based mesh denoising scheme is to transform the local irregular structure around a face into regular form, such that the convolution operation in CNNs can be easily performed, while preserving shape features and face normal information.

To this end, as illustrated in Fig. 2, for a face in mesh, the local 3D structure around it is split into regular small cubes . Each cube is then assigned a label , which is the average normal of all contained faces. Before splitting, a normalization process is applied to improve the robustness of voxelization, which involves two operations: rotation and translation. In this way, all faces are normalized to a similar direction and position. Specifically, for a face , a 2-ring patch is built around it. The average normal of this patch is . We then compute two matrices: that represents the rotation from to a specific angle ; that represents the translation from the face center to (0,0,0). The whole mesh is rotated and translated according to and . Supposing is the coordinate of a vertex in the mesh, the new position of after rotation and translation is:

(4)

After normalization, the space of local mesh structure around is set as , where is the parameter that decides the range of this space. The rest issue is to determine the size of small cubes. In our work, the side length of the cubes is computed as:

(5)

where is the average distance between adjacent faces and , is a parameter that controls the size of the small cubes. The center of is set as .

For each cube, we employ the fast 3D triangle-box overlap testing strategy presented in  [39] to select faces that are overlapped with this cube. If there is at least one face selected for this cube, the assigned label to this cube is the average normal of all selected faces; otherwise, the label is set to (0,0,0). In this way, we convert the irregular local mesh structure into regular 4D-array . In the practical implementation, we set to contain enough faces and then enough cubes. The output of voxelization is then used as the input of the proposed network introduced in the next subsection.

3.3 CNN-based Guidance Normal Learning

Fig. 3: The framework of training data generation.

Using the 4D-array data generated by voxelization as the input, a CNN-based model is employed to derive the guidance normal of . Our NormalNet is mainly inspired by the philosophy of ResNet [40] and VGGNet [41]. We use filters in most of the convolution layers and follow two simple design principles: i) the layers of the same output feature map size have the same filter number; and ii) in order to reduce the computational complexity in each layer, we double the number of filters when the feature map size is halved.

The network architecture is shown in Fig. 4

. The input layer is followed by three residual blocks with 32, 64 and 128 channels, respectively. In each residual block, we add shortcut connection between two layers as same as ResNet to further take advantage of the former feature map information and accelerate the convergence. Then the two convolution feature maps are summed up element-wise and then passed to the next layer after rectification. We perform down-sampling directly by convolution operation with a stride of 2 in the first layer of each residual block. Besides, a global max-pooling layer is adopted after the last convolution layer. The network ends with four fully-connected layers: the first two have 1024 channels each, the third has 128 channels, and the fourth aims to predict the three coordinates of the face normal and thus contains 3 channels. The total number of weighted layers is 25. The former 24 layers are equipped with Relu, while the last layer is equipped with Tanh to make sure the output lies in [-1,1].

Fig. 4: The architecture of deep network in NormalNet. There are three residual blocks in this network, which share the same structure as illustrated at the bottom.

4 NormalNet Training

In this section, we introduce in detail the training process of proposed NormalNet, including: i) the generation of training set, and ii) the training of networks.

4.1 Generation of the Training Data

Different from natural images, for mesh, it is difficult to collect enough examples to build an end-to-end training scheme for deep networks. In our case, this problem is more challenging, since we aim to train multiple CNNs for iterations. Moreover, when the iteration number , the input mesh for training is the denoised result of the previous iteration, which cannot be generated by adding noise to a clean mesh.

To overcome the above problems, we propose an iterative framework for training data generation, as illustrated in Fig. 3. Our goal is to generate a specific training data set and a specific CNN for the iteration. Starting from 12 synthesized mesh models shown in Fig. 5, Gaussian noise distributed in normal direction with the noise levels ranging within is added to obtain the initial noisy exmaples . Here noises with various levels are added in order to improve the robustness of the training process. Then a huge number of faces are selected from and the local structures around them are voxelized to generate the training data set , following the step described in Section 3.2. The ground truth normals are used as target guidance normals. We then train a CNN model on pairs . The denoised examples are obtained by applying guided normal filtering using the ground truth normals as the guidance. The above process is repeated for a certain number of times. When the iteration number is larger than , all are mixed to generate the training data set .

Fig. 5: The 12 mesh models used in training data generation.

It is worth noting that, the faces selected from have various features, such as faces on the smooth region, edge or corner. Since the number of faces on edge and corner is much smaller than that of faces on smooth regions, it is not a smart idea to randomly select faces from

, which leads network training to overfit smooth faces. Instead, we propose to classify all faces into several categories, and for each category, we randomly select the same number of faces so that the training process is balanced to various features. Specifically, we firstly detect edges that the normal difference between adjacent faces is larger than a threshold

, which are regarded as boundaries. Then for each face, the 1-ring patch around it is divided into several sub-regions according to the detected boundaries. The number of sub-regions indicates which category the face belongs to: means smooth face, means edge face, means corner face.

4.2 Network Training

We adopt batch normalization (BN) right after each layer and before activation. The loss function is simply defined as MSE. We use the truncated normal distribution to initialize the weights and train the network from scratch. As for the optimization method, we choose Adam algorithm with a mini-batch size of 100, while the parameters for Adam are chosen as

, and

, which follow the default setting in TensorFlow. The learning rate starts from 0.001 and decays exponentially every certain training steps (such as 10000). Each specific

is trained individually to approximate the corresponding ground truth.

The evaluation metric for the network is defined as the angle between the output normal and the ground truth normal. If this angle is less than

, it is considered as a correct prediction, and vice versa. The average angular error over the entire test set is also used as a reference to measure the training results. Finally, we achieved prediction accuracy on test sets of different noise levels. Furthermore, the average angular error is about .

5 Experimental results

In this section, extensive experimental results are provided to demonstrate the superior performance of our proposed mesh denoising method.

5.1 Comparison Study

We perform experimental comparisons on twelve test models, including six synthetic models: Joint, Twelve, Nicole, Fandisk, Table, Block, and six really scanned models: Angel, Iron, Cube, Rocketarm, Rabbit, Pierrot. For synthetic models, the noise type in Fandisk, Table, Nicole and Block

is Gaussian white noise, while in

Joint and Twelve the noise type is impulsive noise.

We compare the proposed NormalNet with several state-of-the-art algorithms on both objective and subjective evaluations, including 1) the local bilateral normal filtering (BNF) [19], 2) the guided normal filtering (GNF) [10], 3) the L0 minimization optimization (L0M) [5], 4) the BI-normal filtering (BI) [2], 5) the cascaded normal regression based scheme (CNR) [22] 6) the L1-median normal filtering (L1M) [12].

5.2 Objective Performance Comparison

Fig. 6: The distribution of angle error of the estimated guidance normals of NormalNet and GNF. The angle errors are clustered into 6 intervals and the face number in each interval is shown. In our result, the angle errors of most faces are located in the interval , while for GNF the angle errors of most faces are located in .
model Noise Level Type Metrics BNF [19] L0M [5] BI [2] GNF [10] NormalNet
Fandisk 0.3 Gaussian 1.504 1.850 1.509 1.458 1.404
8.094 10.141 11.670 7.615 4.981
Table 0.3 Gaussian 1.511 1.961 1.571 1.894 1.347
17.571 12.348 18.635 17.544 17.898
Joint 0.2 Impulsive 1.514 2.429 1.780 1.428 1.351
3.261 11.181 5.489 5.920 2.369
Twelve 0.5 Impulsive 12.41 20.00 11.68 5.955 4.415
13.465 12.147 20.038 11.099 7.012
Block 0.4 Gaussian 5.157 9.273 4.895 5.417 4.997
11.813 10.722 15.689 10.438 4.678
Nicole 0.2 Gaussian 8.584 10.635 10.195 10.081 10.033
4.880 5.002 4.861 5.056 4.778

TABLE I: Objective performance comparisons with respect to two metrics and .
Parameter Fandisk Table Joint Twelve Block Nicole
25 30 25 75 40 6
20 20 20 20 30 6
0.25 0.25 0.3 0.25 0.3 0.25
Parameter Iron Angel Rocketarm Rabbit Pierrot Cube
30 2 20 4 20 20
10 3 10 4 10 20
0.3 0.3 0.25 0.25 0.25 0.25
TABLE II: The settings of , and .

Two error metrics [19] are employed for evaluating the objective denoising results of synthetic models, including:

  • : the mean angle square error, which represents the accuracy of face normal.

  • : the L2 vertex-based mesh-to-mesh error, which represents the accuracy of vertex’s position.

The comparison results of and are shown in Table I where the best results are bold. It can be seen that the proposed NormalNet achieves the best performance with respect to both metrics on most test models.

The accuracy of estimated guidance normals is critical to the final denoising performance. To further demonstrate the superiority of our learning-based guided normal filtering scheme, we provide a comparison on the guidance normals produced by our NormalNet and the state-of-the-art GNF method. For the test example Fandisk, the angle errors between the estimated guidance normals and ground truth are shown in Fig. 6, which are clustered into six intervals for clear observation. It can be found that, in our result, the angle errors of most faces are located in the interval , while for GNF the angle errors of most faces are located in . Moreover, the average errors of our NormalNet and GNF are and , respectively. These observations demonstrate that the guidance normal estimated by our method has higher accuracy than GNF.

5.3 Subjective Performance Comparison

5.3.1 Results on Synthetic Models

The subjective performance comparison results of our NormalNet against BNF, GNF, L0M and BI on six synthetic models are illustrated in Figs. 7,  8 and 9.

(a)                             (b)                             (c)                             (d)                             (e)                             (f)

Fig. 7: Illustration of the denoising results on models Fandisk, Block and Nicole with Gaussian noise distributed in normal direction. (a) to (f) are the noisy mesh, the results of BNF [19], L0M [5], BI [2], GNF [10] and NormalNet.

Fig. 7 indicates the denoising results of three models with Gaussian noise distributed in normal direction. In Fandisk, from the zoomed-in view, it can be found that our scheme performs much better that other methods at a challenging region that contains corner and narrow edges. The corner is recovered well and the edge is sharp and clean. In Block, the highlighted region in red window is with higher triangulation density. Benefiting from voxelization, our scheme is able to preserve the structure information well and thus is less sensitive to the sampling irregularity. For Nicole, which has rich structures, our scheme still achieves satisfactory feature recovery result.

In Fig. 8 and Fig. 9, we perform comparison on synthetic meshes with impulsive noise and Gaussian noise distributed in random direction, respectively. Our scheme outperforms other schemes in each feature regions although these kinds of noise are not contained in the training sets, which verifies the robustness of NormalNet.

In Figs. 13 and 14, we compare NormalNet with two state-of-the-art schemes CNR [22] and L1M [12] with the results provided by their authors. In Fig. 13, L1M fails in recovering the edges in both models since feature information is lost due to pre-filtering in L1M. In Fig. 14, CNR cannot recover the small features in Fandisk and the irregular sampling region in Block, since CNR is not able to distinguish them from noise.

(a)                             (b)                             (c)                             (d)                             (e)                             (f)

Fig. 8: Illustration of the denoising results on models Twelve and Joint with Impulsive noise distributed in normal direction. (a) to (f) are the noisy mesh, the results of BNF [19] (), L0M [5] (), BI [2] (), GNF [10] () and NormalNet.

(a)                             (b)                             (c)                             (d)                             (e)                             (f)

Fig. 9: Illustration of the denoising results on model Table with Gaussian noise distributed in random direction. (a) to (f) are the noisy mesh, the results of BNF [19] (), L0M [5] (), BI [2] (), GNF [10] () and NormalNet.

5.3.2 Results on Really Scanned Models

We further provide the comparison results on six really scanned models, as illustrated in Figs. 1011 and 12. It is worth noting that the really scanned models are usually with smaller noise level.

Fig. 10 illustrates the denoising results of three scanned models with large-scale structure. In results produced by our method, the structures are recovery very well, such as the long edge in Iron, the edge and corner in Cube and the cylinder in Rocketarm. In contrast, other compared schemes fail in the feature structures preservation for models Iron and Cube. For Rocketarm, BNF recovers the cylinder well, however, it results in over-smoothing for the edges under the cylinder. GNF, L0M and BI generate pseudo-features on the cylinder more or less.

Fig. 11 and Fig. 12 illustrate the denoising results of scanned meshes with more details. In Angel, our scheme succeeds in recovering the features both on mouth and nose, while GNF fails in recovering the mouth, other schemes fail in recovering the nose. In Rabbit, in the highlighted region, the compared schemes tend to introduce pseudo-feature. In Pierrot, we only compare our result with GNF since the codes of L0M and BI provided by their authors cannot process this mesh. The region in the red box is corrupted by serrated noise. For GNF, it is difficult to compute an accurate guidance normal, and thus the final denoising result is not satisfactory. The proposed NormalNet can recover this region well.

5.4 Parameter Settings and Runtime

Model Fandisk Table Joint Twelve Block Nicole
Faces 12946 9108 11267 9216 17550 29437
Time(m) 4 2 4 2 5 10

Model
Iron Angel Rocketarm Rabbit Pierrot Cube
Faces 168285 48090 20088 73679 127612 12453
Time(m) 168 21 7 48 98 3


TABLE III: Runtime per iteration of NormalNet on six test mesh models.

Finally, we clarify the parameters setting and runtime of our method. The parameter setting is fixed for all cases in experimental comparisons. Specifically, the parameters involved in voxelization are set as and , in order to generate enough small cubes to preserve structure information. For the training set generation, the related parameter is set as . For noisy example set generation, we set two kinds of noise level range: and high , which are referred to as low and high noise level sets. The corresponding thresholding iteration number is set as and , respectively.

The parameters and are set following the approach described in [10], which are tuned for specific meshes. is set as the average distance between neighboring face centroids of the whole mesh. is set as a value within the range . The parameter settings of , and are shown in Table II.

In Table III, we provide the runtime of the proposed NormalNet on test mesh examples. On a typical PC with an Intel i7-7700K CPU and a GTX-1080, NormalNet can process about 1387 faces in each minute.

(a)                             (b)                             (c)                             (d)                             (e)                             (f)

Fig. 10: Illustration of the denoising results on scanned models Rocketarm, Iron and Cube. (a) to (f) are the noisy mesh, the results of BNF [19], L0M [5], BI [2], GNF [10] and NormalNet.

(a)                             (b)                             (c)                             (d)                             (e)                             (f)

Fig. 11: Illustration of the denoising results on scanned modelsRabbit and Angle which are with more details . (a) to (f) are the noisy mesh, the results of BNF [19], L0M [5], BI [2], GNF [10] and NormalNet.

(a)                             (b)                             (c)

Fig. 12: Illustration of the denoising results of a scanned model Pierrot with complex structures. (a) to (c) are the noisy mesh, the results of GNF [10] and NormalNet.

(a)                             (b)                             (c)

Fig. 13: Illustration of the denoising results on models Fandisk and Twelve. The noise levels of them are 0.4 and 0.3, respectively. (a) to (c) are the noisy mesh; the results of L1M and NormalNet.

(a)                             (b)                             (c)

Fig. 14: Illustration of the denoising results on models Fandisk and Block. The noise levels of them are both 0.2. (a) to (c) are the noisy mesh, the results of CNR and NormalNet.

6 Conclusion

In this paper, we presented a novel convolutional neural networks based mesh denoising scheme, which employs learning-based strategy to generate the guidance normal. Firstly, to facilitate the 3D convolution operation, for each face in the mesh, we propose a voxelization strategy to transform irregular local mesh structure into regular 4D-array form. The output of voxelization is then inputted to the proposed CNN-based learning scheme to estimate accurate guidance normal. Finally, the guided normal filtering is applied to derive the filtered face normal, according to which the denoised vertices positions are updated. Compared to the state-of-the-art works, the proposed scheme can generate accurate guidance normals and remove noise effectively while preserving original features and avoiding pseudo-features.The experiment results show that our scheme outperforms state-of-the-art works on both objective and subjective quality metrics.

References

  • [1] H. Yagou, Y. Ohtake, and A. Belyaev, “Mesh smoothing via mean and median filtering applied to face normals,” in Geometric Modeling and Processing, 2002. Proceedings.   IEEE, 2002, pp. 124–131.
  • [2] M. Wei, J. Yu, W.-M. Pang, J. Wang, J. Qin, L. Liu, and P.-A. Heng, “Bi-normal filtering for mesh denoising,” IEEE transactions on visualization and computer graphics, vol. 21, no. 1, pp. 43–55, 2015.
  • [3] X. Lu, Z. Deng, and W. Chen, “A robust scheme for feature-preserving mesh denoising,” IEEE Transactions on Visualization and Computer Graphics, vol. 22, no. 3, pp. 1181–1194, 2016.
  • [4]

    M. Wei, L. Liang, W.-M. Pang, J. Wang, W. Li, and H. Wu, “Tensor voting guided mesh denoising,”

    IEEE Transactions on Automation Science and Engineering, 2016.
  • [5] L. He and S. Schaefer, “Mesh denoising via l0 minimization,” ACM Transactions on Graphics (TOG), vol. 32, no. 4, p. 64, 2013.
  • [6] R. Wang, Z. Yang, L. Liu, J. Deng, and F. Chen, “Decoupling noise and features via weighted l1-analysis compressed sensing,” ACM Transactions on Graphics (TOG), vol. 33, no. 2, p. 18, 2014.
  • [7] S. Yoshizawa, A. Belyaev, and H.-P. Seidel, “Smoothing by example: Mesh denoising by averaging with similarity-based weights,” in Shape Modeling and Applications, 2006. SMI 2006. IEEE International Conference on.   IEEE, 2006, pp. 9–9.
  • [8] G. Rosman, A. Dubrovina, and R. Kimmel, “Patch-collaborative spectral point-cloud denoising,” in Computer Graphics Forum, vol. 32, no. 8.   Wiley Online Library, 2013, pp. 1–12.
  • [9] J. Digne, “Similarity based filtering of point clouds,” in Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on.   IEEE, 2012, pp. 73–79.
  • [10] W. Zhang, B. Deng, J. Zhang, S. Bouaziz, and L. Liu, “Guided mesh normal filtering,” in Computer Graphics Forum, vol. 34, no. 7.   Wiley Online Library, 2015, pp. 23–34.
  • [11] T. Li, J. Wang, H. Liu, and L.-g. Liu, “Efficient mesh denoising via robust normal filtering and alternate vertex updating,” Frontiers of Information Technology and Electronic Engineering, vol. 18, no. 11, pp. 1828–1842, 2017.
  • [12] X. Lu, W. Chen, and S. Schaefer, “Robust mesh denoising via vertex pre-filtering and l1-median normal filtering,” Computer Aided Geometric Design, vol. 54, no. Supplement C, pp. 49 – 60, 2017. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0167839617300638
  • [13] S. K. Yadav, U. Reitebuch, and K. Polthier, “Robust and high fidelity mesh denoising,” IEEE Transactions on Visualization and Computer Graphics, vol. PP, no. 99, pp. 1–1, 2017.
  • [14] Y. Chen and T. Pock, “Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1256–1272, 2015.
  • [15] K. Zhang, W. Zuo, L. Zhang, K. Zhang, W. Zuo, L. Zhang, K. Zhang, W. Zuo, and L. Zhang, “Ffdnet: Toward a fast and flexible solution for cnn based image denoising,” IEEE Transactions on Image Processing, vol. 27, no. 9, pp. 4608–4622, 2017.
  • [16] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising,” IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society, vol. PP, no. 99, pp. 1–1, 2017.
  • [17] S. Fleishman, I. Drori, and D. Cohen-Or, “Bilateral mesh denoising,” in ACM transactions on graphics (TOG), vol. 22, no. 3.   ACM, 2003, pp. 950–953.
  • [18] T. R. Jones, F. Durand, and M. Desbrun, “Non-iterative, feature-preserving mesh smoothing,” in ACM Transactions on Graphics (TOG), vol. 22, no. 3.   ACM, 2003, pp. 943–949.
  • [19] Y. Zheng, H. Fu, O. K.-C. Au, and C.-L. Tai, “Bilateral normal filtering for mesh denoising,” IEEE Transactions on Visualization and Computer Graphics, vol. 17, no. 10, pp. 1521–1530, 2011.
  • [20] G. Petschnigg, R. Szeliski, M. Agrawala, M. Cohen, H. Hoppe, and K. Toyama, “Digital photography with flash and no-flash image pairs,” in ACM SIGGRAPH 2004 Papers, ser. SIGGRAPH ’04.   New York, NY, USA: ACM, 2004, pp. 664–672.
  • [21] H. Zhang, C. Wu, J. Zhang, and J. Deng, “Variational mesh denoising using total variation and piecewise constant function space,” IEEE Transactions on Visualization and Computer Graphics, vol. 21, no. 7, pp. 873–886, 2015.
  • [22] P.-S. Wang, Y. Liu, and X. Tong, “Mesh denoising via cascaded normal regression,” ACM Transactions on Graphics (SIGGRAPH Asia), vol. 35, no. 6, 2016.
  • [23] H. Su, S. Maji, E. Kalogerakis, and E. Learned-Miller, “Multi-view convolutional neural networks for 3d shape recognition,” in 2015 IEEE International Conference on Computer Vision (ICCV), Dec 2015, pp. 945–953.
  • [24] B. Shi, S. Bai, Z. Zhou, and X. Bai, “Deeppano: Deep panoramic representation for 3-d shape recognition,” IEEE Signal Processing Letters, vol. 22, no. 12, pp. 2339–2343, Dec 2015.
  • [25] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao, “3d shapenets: A deep representation for volumetric shapes.”   IEEE Computer Society, pp. 1912–1920.
  • [26] D. Maturana and S. Scherer, “Voxnet: A 3d convolutional neural network for real-time object recognition,” in 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Sept 2015, pp. 922–928.
  • [27] P. Wang, Y. Liu, Y. Guo, C. Sun, and X. Tong, “O-CNN: octree-based convolutional neural networks for 3d shape analysis,” CoRR, vol. abs/1712.01537, 2017. [Online]. Available: http://arxiv.org/abs/1712.01537
  • [28] X. Han, Z. Li, H. Huang, E. Kalogerakis, and Y. Yu, “High-resolution shape completion using deep neural networks for global structure and local geometry inference,” CoRR, vol. abs/1709.07599, 2017.
  • [29] Q. Tan, L. Gao, Y. Lai, J. Yang, and S. Xia, “Mesh-based autoencoders for localized deformation component analysis,” CoRR, vol. abs/1709.04304, 2017.
  • [30] B. Davide, M. Jonathan, R. Emanuele, B. M. M., and C. Daniel, “Anisotropic diffusion descriptors,” Computer Graphics Forum, 2016.
  • [31] L. Yi, H. Su, X. Guo, and L. J. Guibas, “Syncspeccnn: Synchronized spectral cnn for 3d shape segmentation,” CoRR, vol. abs/1612.00606, 2016.
  • [32] Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M. Solomon, “Dynamic graph cnn for learning on point clouds,” CoRR, vol. abs/1801.07829, 2018.
  • [33] C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” CoRR, vol. abs/1612.00593, 2016.
  • [34] B. Hua, M. Tran, and S. Yeung, “Point-wise convolutional neural network,” CoRR, vol. abs/1712.05245, 2017.
  • [35] R. Klokov and V. S. Lempitsky, “Escape from cells: Deep kd-networks for the recognition of 3d point cloud models,” in IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, 2017, pp. 863–872.
  • [36] G. Riegler, A. O. Ulusoy, and A. Geiger, “Octnet: Learning deep 3d representations at high resolutions,” CoRR, vol. abs/1611.05009, 2016.
  • [37] C. Tomasi and R. Manduchi, “Bilateral filtering for gray and color images,” in Computer Vision, 1998. Sixth International Conference on.   IEEE, 1998, pp. 839–846.
  • [38] X. Sun, P. Rosin, R. Martin, and F. Langbein, “Fast and effective feature-preserving mesh denoising,” IEEE transactions on visualization and computer graphics, vol. 13, no. 5, 2007.
  • [39] T. Akenine-Möller, “Fast 3d triangle-box overlap testing,” in ACM SIGGRAPH 2005 Courses, ser. SIGGRAPH ’05.   New York, NY, USA: ACM, 2005. [Online]. Available: http://doi.acm.org/10.1145/1198555.1198747
  • [40] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.
  • [41] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” CoRR, vol. abs/1409.1556, 2014.