1 Introduction
Recently, the demand of highfidelity 3D mesh models of real objects has appeared in many domains, such as computer graphics, geometric modeling, computeraided design and movie industry. However, due to the accuracy limitation of scanning devices, raw mesh models are inevitably contaminated by noise, leading to corrupted features that profoundly affect the subsequent applications of meshes. Hence, mesh denoising has become an active research topic in the area of geometry processing.
Mesh denoising is an illposed inverse problem. The nature of mesh denoising is to smooth a noisy surface, concurrently preserving the real object features, while without introducing unnatural geometric distortions. Mesh denoising is a challenging task, especially for cases involving large and dense meshes, and high noise levels. The key to the success of mesh denoising is to differentiate the actual geometry features, such as localized curvature changes and small scale details, and noise generated by scanners. In the literature, there are rich works on mesh denoising, such as filtering based [1, 2]
, featureextraction based
[3, 4], optimizationbased [5, 6], similaritybased [7, 8, 9] and so on. Among them, the twostage approach becomes popular in recent years, where a smoothed face normal is first derived and then the vertices positions are updated so as to integrate this normal with respect to some objective function [10, 11, 12, 13]. This approach treats mesh denoising by either introducing some kind of priors, such as L1norm sparsity [12], or taking advantage of the redundant information from the corrupted mesh itself [10, 11]. However, since mesh has a variety of irregular structures, a certain prior assumption cannot guarantee to be always true. The inaccurate information from noisy mesh also limits the performance of denoising. In these cases, as shown in our experimental comparisons, even stateoftheart denoising approaches cannot produce satisfactory results.In the counterpart 2D image denoising, deep learning based strategies have been widely applied and achieved great success, such as [14, 15, 16]. However, back to mesh denoising, to the best of our knowledge, there are no works following this line. Two main difficulties prevent the usage of convolutional neural networks (CNNs) in mesh denoising:

Meshes are with complex connectivity among vertices. It is challenging to collect enough mesh examples to build an effective endtoend learning model and avoid overfitting at the same time.

In contrast to the regular grid structure of 2D images, meshes are with irregular topology structures. It is not straightforward to apply the regular 3D convolutional kernel in CNNs on mesh.
In this work, we propose a deep learning based face normal filtering scheme, called NormalNet, which is the first work in the literature using CNNs for mesh denoising. NormalNet is tailored to overcome the above difficulties:

For the first problem, instead of the modern endtoend architecture, we follow the twostep framework in [10], which firstly generates the guidance normal and then updates the positions of vertices. In our method, the guidance normal is derived from the proposed CNNbased learning model.

For the second problem, a voxelization strategy is proposed to convert irregular local mesh structure into regular 4Darray form, which can then be processed by convolution operation.
In NormalNet, the ground truth normals are regarded as the target guidance normals in training. Since mesh structures change largely during filteringbased denoising, it is difficult to obtain a uniform CNN that works well in every iteration. Instead, NormalNet contains multiple CNNs corresponding to multiple iterations. To generate the training set for a specific iteration, we collect several meshes, add noise and apply filtering on them, then select a number of faces from the noisy meshes as the training set. In order to ensure the training set contains enough feature structure, faces are divided into several classes according to shape features, and are selected equably from classes with different structures to make up the training set. Since it is difficult to get the ground truth of really scanned meshes, we only train our network on synthesized examples and show by experiments that our model is general and works well on both synthesized and really scanned meshes. Compared to the stateoftheart schemes, the proposed scheme can generate accurate guidance normals and remove noise effectively while preserving original features and avoiding pseudofeatures.
The rest of this paper is organized as follows: in the following section we briefly summarize related works; the proposed scheme is introduced in Section 3; In Section 4, the training of NormalNet is elaborated; Experimental results are presented in Section 5; Section 6 concludes the paper.
2 Related works
In this section, we briefly review related works on filteringbased mesh denoising and neural networks based 3D model processing.
2.1 Filteringbased Mesh Denoising
Owe to the edgepreserving property of bilateral filter, researchers have made many attempts to adopt bilateral filtering in mesh denoising [17, 18, 19]
. Nevertheless, the photometric weight in bilateral filter cannot be estimated accurately from the noisecorrupted mesh. Afterwards, the joint bilateral filter
[20] was proposed to improve the capability of bilateral filtering, in which the photometric weight is computed from the reliable guidance image. Inspired by this idea, Zhang [10] succeed in applying joint bilateral filtering in mesh denoising, in which the guidance information is obtained as the average normal of a local patch. This scheme works well in feature preserving, but cannot achieve satisfactory results in regions with complex shapes and sometimes introduces pseudofeatures. To overcome the limitations of [10], in a subsequent work [11], the guidance normals are computed by the corneraware neighborhood, which is adaptive to the shapes of corners and edges.Recently, there have been increasing efforts to exploit geometric attributes for mesh denoising. In [21], the normal filtering was done by total variation (TV) based denoising scheme, which assumes the normal change satisfies piecewise constant. Wei et al. [2] proposed to cluster faces into piecewise smooth patches, and refine face normals with the help of vertex normal fields. Inspired by [5], Lu et al. [12] applied an additional vertex filtering before the L1median face normal filtering, which is proved to be competent to high noise level and noise distributed in random direction. In [13], Tukey biweight similarity function was proposed to replace the similarity function in the bilateral weights computation; in addition, an edge weighted Laplace operator was introduced for vertex updating to reduce face normal flips.
It is worth noting that, to the best of our knowledge, there are no works using deep neural networks for filteringbased mesh denoising. In [22], Wang et al. proposed to model a regression function with neural networks. However, these networks are not convolutional ones, and just with three layers.
2.2 Neural Networks based 3D Model Processing
Driven by the great success of deep learning in image processing, researchers in graphics are also trying to employ deep neural networks for 3D model processing. However, due to the property of irregular connectivity, it is still a challenging issue to process 3D models with neural networks. Numerous works focus on transforming 3D model into regular data. For instance, in [23, 24], 3D models were represented by 2D rendered images and panoramic view respectively. Besides, some works [25, 26, 27, 28] employed voxelization on models to transform them into regular 3D data. Moreover, in [29, 30, 31, 32], meshes were represented in spectral or spatial domain for further processing.
In addition to these transformbased techniques, directly running neural networks on irregular data has also been extensively studied. PointNet [33] was one of the first network architectures that can handle point cloud data. Subsequently, in [34], the pointwise convolutional operator was proposed, which can extract features from each point. In [35], a kdtree was constructed on point cloud, which is further used as the input of neural networks. Similar idea also showed in [36], in which the points were organized by octree. However, these schemes are designed and validated for point cloud data. There are no attempts for mesh denoising.
3 the Proposed NormalNet
In this section, we elaborate the proposed learning based guided normal filtering for mesh denoising.
3.1 Overview
We first overview our proposed learning based scheme for mesh denoising. As illustrated in Fig. 1, the proposed method includes a series of iteratively performed modules. Since mesh structures change largely during iterations, it is difficult to obtain a uniform CNN that works well for all iterations. Another choice is to train a CNN for each iteration. However, this strategy would introduce expensive computation burden that is unbearable. Instead, we propose to train multiple CNN models adapted to noise levels for iterations. Specifically, considering in the first iterations the structure changes between successive iterations are remarkably large, we train CNN models for each iteration respectively. For the rest iterations, a single CNN model is used for the corresponding modules. The thresholding iteration number is determined by experiments.
All modules share the same workflow: firstly, to facilitate the convolution operation, for a face in the mesh, we propose a voxelization strategy to transform irregular local mesh structure around into regular 4Darray form. The output of voxelization is then inputted to the proposed CNNbased learning scheme to estimate accurate guidance normal. Finally, the guided normal filtering is applied to derive the filtered face normal :
(1) 
where and are the center and normal of ; is the geometrical neighboring faces of selected using the method in [10]; is the area of ; and are the center and the guidance normal of ; is a normalization factor to ensure that
is a unit vector;
and are the Gaussian kernels [37], which are computed by:(2) 
(3) 
where and are the Gaussian function parameters. The positions of vertices are updated according to the filtered normal , following the idea of [38]. The face normal filtering and the positions update of vertices are iteratively repeated for and times respectively, and we obtain the denoised mesh finally.
3.2 Voxelization of Mesh
The key to the success of our CNNbased mesh denoising scheme is to transform the local irregular structure around a face into regular form, such that the convolution operation in CNNs can be easily performed, while preserving shape features and face normal information.
To this end, as illustrated in Fig. 2, for a face in mesh, the local 3D structure around it is split into regular small cubes . Each cube is then assigned a label , which is the average normal of all contained faces. Before splitting, a normalization process is applied to improve the robustness of voxelization, which involves two operations: rotation and translation. In this way, all faces are normalized to a similar direction and position. Specifically, for a face , a 2ring patch is built around it. The average normal of this patch is . We then compute two matrices: that represents the rotation from to a specific angle ; that represents the translation from the face center to (0,0,0). The whole mesh is rotated and translated according to and . Supposing is the coordinate of a vertex in the mesh, the new position of after rotation and translation is:
(4) 
After normalization, the space of local mesh structure around is set as , where is the parameter that decides the range of this space. The rest issue is to determine the size of small cubes. In our work, the side length of the cubes is computed as:
(5) 
where is the average distance between adjacent faces and , is a parameter that controls the size of the small cubes. The center of is set as .
For each cube, we employ the fast 3D trianglebox overlap testing strategy presented in [39] to select faces that are overlapped with this cube. If there is at least one face selected for this cube, the assigned label to this cube is the average normal of all selected faces; otherwise, the label is set to (0,0,0). In this way, we convert the irregular local mesh structure into regular 4Darray . In the practical implementation, we set to contain enough faces and then enough cubes. The output of voxelization is then used as the input of the proposed network introduced in the next subsection.
3.3 CNNbased Guidance Normal Learning
Using the 4Darray data generated by voxelization as the input, a CNNbased model is employed to derive the guidance normal of . Our NormalNet is mainly inspired by the philosophy of ResNet [40] and VGGNet [41]. We use filters in most of the convolution layers and follow two simple design principles: i) the layers of the same output feature map size have the same filter number; and ii) in order to reduce the computational complexity in each layer, we double the number of filters when the feature map size is halved.
The network architecture is shown in Fig. 4
. The input layer is followed by three residual blocks with 32, 64 and 128 channels, respectively. In each residual block, we add shortcut connection between two layers as same as ResNet to further take advantage of the former feature map information and accelerate the convergence. Then the two convolution feature maps are summed up elementwise and then passed to the next layer after rectification. We perform downsampling directly by convolution operation with a stride of 2 in the first layer of each residual block. Besides, a global maxpooling layer is adopted after the last convolution layer. The network ends with four fullyconnected layers: the first two have 1024 channels each, the third has 128 channels, and the fourth aims to predict the three coordinates of the face normal and thus contains 3 channels. The total number of weighted layers is 25. The former 24 layers are equipped with Relu, while the last layer is equipped with Tanh to make sure the output lies in [1,1].
4 NormalNet Training
In this section, we introduce in detail the training process of proposed NormalNet, including: i) the generation of training set, and ii) the training of networks.
4.1 Generation of the Training Data
Different from natural images, for mesh, it is difficult to collect enough examples to build an endtoend training scheme for deep networks. In our case, this problem is more challenging, since we aim to train multiple CNNs for iterations. Moreover, when the iteration number , the input mesh for training is the denoised result of the previous iteration, which cannot be generated by adding noise to a clean mesh.
To overcome the above problems, we propose an iterative framework for training data generation, as illustrated in Fig. 3. Our goal is to generate a specific training data set and a specific CNN for the iteration. Starting from 12 synthesized mesh models shown in Fig. 5, Gaussian noise distributed in normal direction with the noise levels ranging within is added to obtain the initial noisy exmaples . Here noises with various levels are added in order to improve the robustness of the training process. Then a huge number of faces are selected from and the local structures around them are voxelized to generate the training data set , following the step described in Section 3.2. The ground truth normals are used as target guidance normals. We then train a CNN model on pairs . The denoised examples are obtained by applying guided normal filtering using the ground truth normals as the guidance. The above process is repeated for a certain number of times. When the iteration number is larger than , all are mixed to generate the training data set .
It is worth noting that, the faces selected from have various features, such as faces on the smooth region, edge or corner. Since the number of faces on edge and corner is much smaller than that of faces on smooth regions, it is not a smart idea to randomly select faces from
, which leads network training to overfit smooth faces. Instead, we propose to classify all faces into several categories, and for each category, we randomly select the same number of faces so that the training process is balanced to various features. Specifically, we firstly detect edges that the normal difference between adjacent faces is larger than a threshold
, which are regarded as boundaries. Then for each face, the 1ring patch around it is divided into several subregions according to the detected boundaries. The number of subregions indicates which category the face belongs to: means smooth face, means edge face, means corner face.4.2 Network Training
We adopt batch normalization (BN) right after each layer and before activation. The loss function is simply defined as MSE. We use the truncated normal distribution to initialize the weights and train the network from scratch. As for the optimization method, we choose Adam algorithm with a minibatch size of 100, while the parameters for Adam are chosen as
, and, which follow the default setting in TensorFlow. The learning rate starts from 0.001 and decays exponentially every certain training steps (such as 10000). Each specific
is trained individually to approximate the corresponding ground truth.The evaluation metric for the network is defined as the angle between the output normal and the ground truth normal. If this angle is less than
, it is considered as a correct prediction, and vice versa. The average angular error over the entire test set is also used as a reference to measure the training results. Finally, we achieved prediction accuracy on test sets of different noise levels. Furthermore, the average angular error is about .5 Experimental results
In this section, extensive experimental results are provided to demonstrate the superior performance of our proposed mesh denoising method.
5.1 Comparison Study
We perform experimental comparisons on twelve test models, including six synthetic models: Joint, Twelve, Nicole, Fandisk, Table, Block, and six really scanned models: Angel, Iron, Cube, Rocketarm, Rabbit, Pierrot. For synthetic models, the noise type in Fandisk, Table, Nicole and Block
is Gaussian white noise, while in
Joint and Twelve the noise type is impulsive noise.We compare the proposed NormalNet with several stateoftheart algorithms on both objective and subjective evaluations, including 1) the local bilateral normal filtering (BNF) [19], 2) the guided normal filtering (GNF) [10], 3) the L0 minimization optimization (L0M) [5], 4) the BInormal filtering (BI) [2], 5) the cascaded normal regression based scheme (CNR) [22] 6) the L1median normal filtering (L1M) [12].
5.2 Objective Performance Comparison
model  Noise Level  Type  Metrics  BNF [19]  L0M [5]  BI [2]  GNF [10]  NormalNet 

Fandisk  0.3  Gaussian  1.504  1.850  1.509  1.458  1.404  
8.094  10.141  11.670  7.615  4.981  
Table  0.3  Gaussian  1.511  1.961  1.571  1.894  1.347  
17.571  12.348  18.635  17.544  17.898  
Joint  0.2  Impulsive  1.514  2.429  1.780  1.428  1.351  
3.261  11.181  5.489  5.920  2.369  
Twelve  0.5  Impulsive  12.41  20.00  11.68  5.955  4.415  
13.465  12.147  20.038  11.099  7.012  
Block  0.4  Gaussian  5.157  9.273  4.895  5.417  4.997  
11.813  10.722  15.689  10.438  4.678  
Nicole  0.2  Gaussian  8.584  10.635  10.195  10.081  10.033  
4.880  5.002  4.861  5.056  4.778  

Parameter  Fandisk  Table  Joint  Twelve  Block  Nicole 

25  30  25  75  40  6  
20  20  20  20  30  6  
0.25  0.25  0.3  0.25  0.3  0.25  
Parameter  Iron  Angel  Rocketarm  Rabbit  Pierrot  Cube 
30  2  20  4  20  20  
10  3  10  4  10  20  
0.3  0.3  0.25  0.25  0.25  0.25 
Two error metrics [19] are employed for evaluating the objective denoising results of synthetic models, including:

: the mean angle square error, which represents the accuracy of face normal.

: the L2 vertexbased meshtomesh error, which represents the accuracy of vertex’s position.
The comparison results of and are shown in Table I where the best results are bold. It can be seen that the proposed NormalNet achieves the best performance with respect to both metrics on most test models.
The accuracy of estimated guidance normals is critical to the final denoising performance. To further demonstrate the superiority of our learningbased guided normal filtering scheme, we provide a comparison on the guidance normals produced by our NormalNet and the stateoftheart GNF method. For the test example Fandisk, the angle errors between the estimated guidance normals and ground truth are shown in Fig. 6, which are clustered into six intervals for clear observation. It can be found that, in our result, the angle errors of most faces are located in the interval , while for GNF the angle errors of most faces are located in . Moreover, the average errors of our NormalNet and GNF are and , respectively. These observations demonstrate that the guidance normal estimated by our method has higher accuracy than GNF.
5.3 Subjective Performance Comparison
5.3.1 Results on Synthetic Models
The subjective performance comparison results of our NormalNet against BNF, GNF, L0M and BI on six synthetic models are illustrated in Figs. 7, 8 and 9.
Fig. 7 indicates the denoising results of three models with Gaussian noise distributed in normal direction. In Fandisk, from the zoomedin view, it can be found that our scheme performs much better that other methods at a challenging region that contains corner and narrow edges. The corner is recovered well and the edge is sharp and clean. In Block, the highlighted region in red window is with higher triangulation density. Benefiting from voxelization, our scheme is able to preserve the structure information well and thus is less sensitive to the sampling irregularity. For Nicole, which has rich structures, our scheme still achieves satisfactory feature recovery result.
In Fig. 8 and Fig. 9, we perform comparison on synthetic meshes with impulsive noise and Gaussian noise distributed in random direction, respectively. Our scheme outperforms other schemes in each feature regions although these kinds of noise are not contained in the training sets, which verifies the robustness of NormalNet.
In Figs. 13 and 14, we compare NormalNet with two stateoftheart schemes CNR [22] and L1M [12] with the results provided by their authors. In Fig. 13, L1M fails in recovering the edges in both models since feature information is lost due to prefiltering in L1M. In Fig. 14, CNR cannot recover the small features in Fandisk and the irregular sampling region in Block, since CNR is not able to distinguish them from noise.
5.3.2 Results on Really Scanned Models
We further provide the comparison results on six really scanned models, as illustrated in Figs. 10, 11 and 12. It is worth noting that the really scanned models are usually with smaller noise level.
Fig. 10 illustrates the denoising results of three scanned models with largescale structure. In results produced by our method, the structures are recovery very well, such as the long edge in Iron, the edge and corner in Cube and the cylinder in Rocketarm. In contrast, other compared schemes fail in the feature structures preservation for models Iron and Cube. For Rocketarm, BNF recovers the cylinder well, however, it results in oversmoothing for the edges under the cylinder. GNF, L0M and BI generate pseudofeatures on the cylinder more or less.
Fig. 11 and Fig. 12 illustrate the denoising results of scanned meshes with more details. In Angel, our scheme succeeds in recovering the features both on mouth and nose, while GNF fails in recovering the mouth, other schemes fail in recovering the nose. In Rabbit, in the highlighted region, the compared schemes tend to introduce pseudofeature. In Pierrot, we only compare our result with GNF since the codes of L0M and BI provided by their authors cannot process this mesh. The region in the red box is corrupted by serrated noise. For GNF, it is difficult to compute an accurate guidance normal, and thus the final denoising result is not satisfactory. The proposed NormalNet can recover this region well.
5.4 Parameter Settings and Runtime
Model  Fandisk  Table  Joint  Twelve  Block  Nicole 

Faces  12946  9108  11267  9216  17550  29437 
Time(m)  4  2  4  2  5  10 
Model 
Iron  Angel  Rocketarm  Rabbit  Pierrot  Cube 
Faces  168285  48090  20088  73679  127612  12453 
Time(m)  168  21  7  48  98  3 

Finally, we clarify the parameters setting and runtime of our method. The parameter setting is fixed for all cases in experimental comparisons. Specifically, the parameters involved in voxelization are set as and , in order to generate enough small cubes to preserve structure information. For the training set generation, the related parameter is set as . For noisy example set generation, we set two kinds of noise level range: and high , which are referred to as low and high noise level sets. The corresponding thresholding iteration number is set as and , respectively.
The parameters and are set following the approach described in [10], which are tuned for specific meshes. is set as the average distance between neighboring face centroids of the whole mesh. is set as a value within the range . The parameter settings of , and are shown in Table II.
In Table III, we provide the runtime of the proposed NormalNet on test mesh examples. On a typical PC with an Intel i77700K CPU and a GTX1080, NormalNet can process about 1387 faces in each minute.
6 Conclusion
In this paper, we presented a novel convolutional neural networks based mesh denoising scheme, which employs learningbased strategy to generate the guidance normal. Firstly, to facilitate the 3D convolution operation, for each face in the mesh, we propose a voxelization strategy to transform irregular local mesh structure into regular 4Darray form. The output of voxelization is then inputted to the proposed CNNbased learning scheme to estimate accurate guidance normal. Finally, the guided normal filtering is applied to derive the filtered face normal, according to which the denoised vertices positions are updated. Compared to the stateoftheart works, the proposed scheme can generate accurate guidance normals and remove noise effectively while preserving original features and avoiding pseudofeatures.The experiment results show that our scheme outperforms stateoftheart works on both objective and subjective quality metrics.
References
 [1] H. Yagou, Y. Ohtake, and A. Belyaev, “Mesh smoothing via mean and median filtering applied to face normals,” in Geometric Modeling and Processing, 2002. Proceedings. IEEE, 2002, pp. 124–131.
 [2] M. Wei, J. Yu, W.M. Pang, J. Wang, J. Qin, L. Liu, and P.A. Heng, “Binormal filtering for mesh denoising,” IEEE transactions on visualization and computer graphics, vol. 21, no. 1, pp. 43–55, 2015.
 [3] X. Lu, Z. Deng, and W. Chen, “A robust scheme for featurepreserving mesh denoising,” IEEE Transactions on Visualization and Computer Graphics, vol. 22, no. 3, pp. 1181–1194, 2016.

[4]
M. Wei, L. Liang, W.M. Pang, J. Wang, W. Li, and H. Wu, “Tensor voting guided mesh denoising,”
IEEE Transactions on Automation Science and Engineering, 2016.  [5] L. He and S. Schaefer, “Mesh denoising via l0 minimization,” ACM Transactions on Graphics (TOG), vol. 32, no. 4, p. 64, 2013.
 [6] R. Wang, Z. Yang, L. Liu, J. Deng, and F. Chen, “Decoupling noise and features via weighted l1analysis compressed sensing,” ACM Transactions on Graphics (TOG), vol. 33, no. 2, p. 18, 2014.
 [7] S. Yoshizawa, A. Belyaev, and H.P. Seidel, “Smoothing by example: Mesh denoising by averaging with similaritybased weights,” in Shape Modeling and Applications, 2006. SMI 2006. IEEE International Conference on. IEEE, 2006, pp. 9–9.
 [8] G. Rosman, A. Dubrovina, and R. Kimmel, “Patchcollaborative spectral pointcloud denoising,” in Computer Graphics Forum, vol. 32, no. 8. Wiley Online Library, 2013, pp. 1–12.
 [9] J. Digne, “Similarity based filtering of point clouds,” in Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on. IEEE, 2012, pp. 73–79.
 [10] W. Zhang, B. Deng, J. Zhang, S. Bouaziz, and L. Liu, “Guided mesh normal filtering,” in Computer Graphics Forum, vol. 34, no. 7. Wiley Online Library, 2015, pp. 23–34.
 [11] T. Li, J. Wang, H. Liu, and L.g. Liu, “Efficient mesh denoising via robust normal filtering and alternate vertex updating,” Frontiers of Information Technology and Electronic Engineering, vol. 18, no. 11, pp. 1828–1842, 2017.
 [12] X. Lu, W. Chen, and S. Schaefer, “Robust mesh denoising via vertex prefiltering and l1median normal filtering,” Computer Aided Geometric Design, vol. 54, no. Supplement C, pp. 49 – 60, 2017. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0167839617300638
 [13] S. K. Yadav, U. Reitebuch, and K. Polthier, “Robust and high fidelity mesh denoising,” IEEE Transactions on Visualization and Computer Graphics, vol. PP, no. 99, pp. 1–1, 2017.
 [14] Y. Chen and T. Pock, “Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1256–1272, 2015.
 [15] K. Zhang, W. Zuo, L. Zhang, K. Zhang, W. Zuo, L. Zhang, K. Zhang, W. Zuo, and L. Zhang, “Ffdnet: Toward a fast and flexible solution for cnn based image denoising,” IEEE Transactions on Image Processing, vol. 27, no. 9, pp. 4608–4622, 2017.
 [16] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising,” IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society, vol. PP, no. 99, pp. 1–1, 2017.
 [17] S. Fleishman, I. Drori, and D. CohenOr, “Bilateral mesh denoising,” in ACM transactions on graphics (TOG), vol. 22, no. 3. ACM, 2003, pp. 950–953.
 [18] T. R. Jones, F. Durand, and M. Desbrun, “Noniterative, featurepreserving mesh smoothing,” in ACM Transactions on Graphics (TOG), vol. 22, no. 3. ACM, 2003, pp. 943–949.
 [19] Y. Zheng, H. Fu, O. K.C. Au, and C.L. Tai, “Bilateral normal filtering for mesh denoising,” IEEE Transactions on Visualization and Computer Graphics, vol. 17, no. 10, pp. 1521–1530, 2011.
 [20] G. Petschnigg, R. Szeliski, M. Agrawala, M. Cohen, H. Hoppe, and K. Toyama, “Digital photography with flash and noflash image pairs,” in ACM SIGGRAPH 2004 Papers, ser. SIGGRAPH ’04. New York, NY, USA: ACM, 2004, pp. 664–672.
 [21] H. Zhang, C. Wu, J. Zhang, and J. Deng, “Variational mesh denoising using total variation and piecewise constant function space,” IEEE Transactions on Visualization and Computer Graphics, vol. 21, no. 7, pp. 873–886, 2015.
 [22] P.S. Wang, Y. Liu, and X. Tong, “Mesh denoising via cascaded normal regression,” ACM Transactions on Graphics (SIGGRAPH Asia), vol. 35, no. 6, 2016.
 [23] H. Su, S. Maji, E. Kalogerakis, and E. LearnedMiller, “Multiview convolutional neural networks for 3d shape recognition,” in 2015 IEEE International Conference on Computer Vision (ICCV), Dec 2015, pp. 945–953.
 [24] B. Shi, S. Bai, Z. Zhou, and X. Bai, “Deeppano: Deep panoramic representation for 3d shape recognition,” IEEE Signal Processing Letters, vol. 22, no. 12, pp. 2339–2343, Dec 2015.
 [25] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao, “3d shapenets: A deep representation for volumetric shapes.” IEEE Computer Society, pp. 1912–1920.
 [26] D. Maturana and S. Scherer, “Voxnet: A 3d convolutional neural network for realtime object recognition,” in 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Sept 2015, pp. 922–928.
 [27] P. Wang, Y. Liu, Y. Guo, C. Sun, and X. Tong, “OCNN: octreebased convolutional neural networks for 3d shape analysis,” CoRR, vol. abs/1712.01537, 2017. [Online]. Available: http://arxiv.org/abs/1712.01537
 [28] X. Han, Z. Li, H. Huang, E. Kalogerakis, and Y. Yu, “Highresolution shape completion using deep neural networks for global structure and local geometry inference,” CoRR, vol. abs/1709.07599, 2017.
 [29] Q. Tan, L. Gao, Y. Lai, J. Yang, and S. Xia, “Meshbased autoencoders for localized deformation component analysis,” CoRR, vol. abs/1709.04304, 2017.
 [30] B. Davide, M. Jonathan, R. Emanuele, B. M. M., and C. Daniel, “Anisotropic diffusion descriptors,” Computer Graphics Forum, 2016.
 [31] L. Yi, H. Su, X. Guo, and L. J. Guibas, “Syncspeccnn: Synchronized spectral cnn for 3d shape segmentation,” CoRR, vol. abs/1612.00606, 2016.
 [32] Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M. Solomon, “Dynamic graph cnn for learning on point clouds,” CoRR, vol. abs/1801.07829, 2018.
 [33] C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” CoRR, vol. abs/1612.00593, 2016.
 [34] B. Hua, M. Tran, and S. Yeung, “Pointwise convolutional neural network,” CoRR, vol. abs/1712.05245, 2017.
 [35] R. Klokov and V. S. Lempitsky, “Escape from cells: Deep kdnetworks for the recognition of 3d point cloud models,” in IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 2229, 2017, 2017, pp. 863–872.
 [36] G. Riegler, A. O. Ulusoy, and A. Geiger, “Octnet: Learning deep 3d representations at high resolutions,” CoRR, vol. abs/1611.05009, 2016.
 [37] C. Tomasi and R. Manduchi, “Bilateral filtering for gray and color images,” in Computer Vision, 1998. Sixth International Conference on. IEEE, 1998, pp. 839–846.
 [38] X. Sun, P. Rosin, R. Martin, and F. Langbein, “Fast and effective featurepreserving mesh denoising,” IEEE transactions on visualization and computer graphics, vol. 13, no. 5, 2007.
 [39] T. AkenineMöller, “Fast 3d trianglebox overlap testing,” in ACM SIGGRAPH 2005 Courses, ser. SIGGRAPH ’05. New York, NY, USA: ACM, 2005. [Online]. Available: http://doi.acm.org/10.1145/1198555.1198747
 [40] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.
 [41] K. Simonyan and A. Zisserman, “Very deep convolutional networks for largescale image recognition,” CoRR, vol. abs/1409.1556, 2014.