Local Frequency Interpretation and Non-Local Self-Similarity on Graph for Point Cloud Inpainting

09/28/2018 ∙ by Zeqing Fu, et al. ∙ Peking University 0

As 3D scanning devices and depth sensors mature, point clouds have attracted increasing attention as a format for 3D object representation, with applications in various fields such as tele-presence, navigation and heritage reconstruction. However, point clouds usually exhibit holes of missing data, mainly due to the limitation of acquisition techniques and complicated structure. Further, point clouds are defined on irregular non-Euclidean domains, which is challenging to address especially with conventional signal processing tools. Hence, leveraging on recent advances in graph signal processing, we propose an efficient point cloud inpainting method, exploiting both the local smoothness and the non-local self-similarity in point clouds. Specifically, we first propose a frequency interpretation in graph nodal domain, based on which we introduce the local graph-signal smoothness prior in order to describe the local smoothness of point clouds. Secondly, we explore the characteristics of non-local self-similarity, by globally searching for the most similar area to the missing region. The similarity metric between two areas is defined based on the direct component and the anisotropic graph total variation of normals in each area. Finally, we formulate the hole-filling step as an optimization problem based on the selected most similar area and regularized by the graph-signal smoothness prior. Besides, we propose voxelization and automatic hole detection methods for the point cloud prior to inpainting. Experimental results show that the proposed approach outperforms four competing methods significantly, both in objective and subjective quality.



There are no comments yet.


This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Graphs are generic data representation forms for the description of pairwise relations in the data, which consist of vertices and connecting edges. The weight associated with each edge in the graph often represents the similarity between the two vertices it connects. Nowadays, graphs have been used to model various types of relations in physical, biological, social and information systems because of advantages in consuming irregular discrete signals. The connectivities and edge weights are either dictated by the physics (e.g. physical distance between nodes ) or inferred from the data. The data residing on these graphs is a finite collection of samples, with one sample at each vertex in the graph, which is referred to as graph signal [1]

. Compared with conventional signals (e.g., images and videos) defined on regular grids, graph signals are often defined on non-Euclidean domains where the neighborhood is unclear and the order might be unimportant. Statistical properties such as stationarity and compositionality are even not well defined for graph signals. Such kinds of data arise in diverse engineering and science fields, such as logistics data in transportation networks, functionalities in neural networks, temperatures on sensor networks, pixel intensities in images, etc.

For any undirected graph, there is an orthogonal basis corresponding to the eigenvectors of a graph matrix

111In spectral graph theory [2], a graph matrix is a matrix describing the connectivities in the graph, such as the commonly used graph Laplacian matrix, which will be introduced in Section III.. Thus once a graph has been defined, a definition of frequency is readily available, which allows us to extend conventional signal processing to graph signals. Multiple choices are possible, such as the cut size of the constructed orthogonal basis set [3] and the total variation of the associated spectral component [4]. These methods are able to accurately reflect the relationship among the frequencies of different eigenvectors in certain kinds of graph, but they are inapplicable to other graphs. Hence, it still remains an open question to choose a local frequency interpretation appropriately for a given application, such as inpainting in our context, which is actively being investigated.

As the local information is often not enough for applications like inpainting, we also resort to the wisdom of non-local self-similarity [5]. This idea is under the assumption of non-local stationarity, and has been widely used in image processing such as image denoising, inpainting, texture segmentation and authentication [6, 7, 8, 9, 10, 11]. However, non-local self-similarity on graphs is challenging because it is difficult to define stationarity on irregular graphs. Qi et al. [12] introduce a weighted graph model to investigate the self-similarity characteristics of eubacteria genomes, but their graph signals are defined on regular Euclidean domains. Pang et al. [13] define self-similarity in images via regular graphs, but there is no such definition for high-dimensional and irregular signal yet.

Fig. 1: Point cloud Phili (a) with three self-similarity examples (b) (c) (d). (b) is an example with a hole due to the inherent limitations of the acquisition equipments. (c) (d) are non-locally self-similar regions to (b).

In particular, we propose to exploit both local frequency interpretation and non-local self-similarity on graph for point cloud inpainting. Point clouds have received increasing attention as an efficient representation of 3D objects. It consists of a set of points, each of which corresponds to a measurement point with the 3D coordinates representing the geometric information and possibly attribute information such as color and normal. The development of depth sensing and 3D laser scanning222Examples include Microsoft Kinect, Intel RealSense, LiDAR, etc. techniques enables the convenient acquisition of point clouds, thus catalyzing its applications in various fields, such as 3D immersive tele-presence, navigation in autonomous driving, and heritage reconstruction [14]. However, point clouds often exhibit several holes of missing data inevitably, as shown in Fig. 1 (b). This is mainly due to incomplete scanning views and inherent limitations of the acquisition equipments333For example, 3D laser scanning is less sensitive to the objects in dark colors. This is because the darker it is, the more wavelengths of light the surface absorbs and the less light it reflects. Thus the laser scanning devices are unable to receive enough reflected light for dark objects to recognize.. Besides, there may lack some regions in the data itself (e.g., dilapidated heritage). Therefore, it is necessary to inpaint the incomplete point clouds prior to subsequent applications.

Our proposed point cloud inpainting method has three main contributions. Firstly, we present a frequency interpretaion in graph nodal domain, in which we prove the positive correlation between eigenvalues and the frequencies of eigenvectors for the graph Laplacian. Based on this correlation, we discuss a graph-signal smoothness prior from the perspective of suppressing high-frequency components so as to smooth and denoise the inpainting area at the same time.

Secondly, we propose a point cloud inpainting method by exploiting the non-local self-similarity in the geometry of point clouds, leveraging on our previous work [15]. We globally search for a source region which is the most similar to the target region (i.e., the region containing the hole), and then fill the hole by formulating an optimization problem with the source region and the aforementioned graph-signal smoothness prior. In particular, we define the similarity metric between two regions on graph based on 1) the direct component (DC) of the normals of points and 2) the anistropic graph total variation (AGTV) of normals, which is an extension of the graph total variation [16, 17, 18]. Further, we improve our previous work by 1) mirroring the filtered cubes (the processing unit in our method) so as to augment non-local self-similar candidates; and 2) registering the geometric structure in the source cube and the target cube in order to strengthen the similarity between them for better inpainting, which includes the translation obtained by the difference in location between the source cube and the target cube and the rolation obtained by the proposed simplified Iterative Closest Points (ICP) method [19, 20].

Thirdly, we propose voxelization and automatic hole detection prior to point cloud inpainting. Specifically, we perturb each point in the point cloud to regular grids so as to facilitate the subsequent processing. Thereafter, we design a hole detection method in the voxelized point cloud based on Breadth-First Search (BFS). We project the 3D point cloud to a 2D depth map along the principle projection direction, search for holes in the 2D depth map via BFS, and finally obtain the 3D holes by calculating the coordinate range of the hole along the projection direction. Experimental results show that our approach outperforms four competing methods significantly, both in objective and subjective quality.

The outline of the paper is as follows. We first discuss previous methods in Section II. Then we propose our frequency interpretation in Section III and overview the proposed point cloud inpainting framework in Section IV. Next, we present our voxelization and hole detection methods in Section V and Section VI, respectively. In Section VII, we elaborate on the proposed inpainting approach, including preprocessing, non-local similar cube searching, structure matching and problem formulation. Finally, experimental results and conclusion are presented in Section VIII and IX, respectively.

Ii Related Work

We review the previous work on hole detection and inpainting respectively.

Ii-a Hole Detection

Existing hole detection methods for point clouds include mesh-based methods and point-based methods. The idea of mesh-based methods is to construct a mesh over the point cloud, extract the hole boundary by some greedy strategy and treat the point not surrounded by the triangle chains as the boundary feature point [21, 22, 23]. Nevertheless, the computational complexity is high, and the extracted boundaries are not accurate enough when the point distribution is uneven. Point-based methods are thus proposed to perform hole detection on the point cloud directly. [24, 25, 26] extract the exterior boundary of the surface based on the neighborhood relationship among the 3D points, and then detect the hole boundary by a growth function on the interior boundary of the surface. However, these methods are unable to distinguish the hole from closed edges. Bendels et al. [27]

compute a boundary probability for each point, and construct closed loops circumscribing the hole in a shortest-cost-path manner.

[28] and [29] detect the K-nearest neighbors of each point and calculate the connection between the neighbors. When the angle between two adjacent edges reaches a threshold, the point is regarded as a boundary point. Nevertheless, these methods ([27, 28, 29]) often treat points in sparse areas as the hole boundary, and the detected boundary points are likely to be incomplete and incoherent.

Ii-B Point Clouds Inpainting

Few point clouds inpainting methods have been proposed, which include two main categories according to the cause of holes: (1) restore holes in the object itself such as heritage and sculptures [30, 31, 32, 33, 34], and (2) inpaint holes caused by the limitation of scanning devices [35, 36, 37]. For the first class of methods, the main hole-filling data source is online database, as the holes are often large. Sahay et al. [30] attempt to fill big damage regions using neighbourhood surface geometry and geometric prior derived from registered similar examples in a library. Then they propose another method [31], which projects the point cloud to a depth map, searches a similar one from an online depth database via dictionary learning, and then adaptively propagates the local 3D surface smoothness. However, the projection process inevitably introduces geometric loss. Instead, Dinesh et al. [34] fill this kind of holes just with the data in the object itself. In particular, they give priority to the points along the hole boundary so as to determine the inpainting order, and find best matching regions based on an exemplar, which are used to fill the missing area. The results still suffer from geometric distortion due to the simple data source.

The other class of methods focus on holes generated due to the limitations of scanning devices. This kind of holes is smaller than the aforementioned ones in general, thus the information of the data itself is often enough for inpainting. Wang et al. [35]

create a triangle mesh from the input point cloud, identify the vicinity of the hole and interpolate the missing area based on a moving least squares approach. Lozes et al. 

[36] deploy partial difference operaters to solve an optimization equation on the point cloud, which only refers to the neighborhood of the hole to compute the geometric structure. Lin et al. [37]

segment a hole into several sub-holes, and then fill the sub-holes by extracting geometric features separately with tensor voting. Due to the reference information from the local neighborhood only, the results of these methods tend to be more planar than the ground truth. Also, artifacts are likely to occur around the boundary when the geometric structure is complicated.

In addition, some works deal with particular point cloud data. For instance, flattened bar-shaped holes in the human body data are studied in [29], where the holes are filled by calculating the extension direction of the boundary points from their neighboring points iteratively. [38] mainly focuses on geometrically regular and planar point clouds of buildings from obstructions such as trees to fill the holes, which searches non-local similar blocks by performing Fourier analysis on each scanline in real-time scanning to detect the periodicity of the plane with scene elements. These methods have higher data requirements, thus they are unsuitable for general cases.

Iii Frequency Interpretation in Graph Nodal Domain

We first provide a review on basic concepts in spectral graph theory, including graph, graph Laplacian and graph-signal smoothness prior, and then propose the frequency interpretation of the prior in graph nodal domain, which will be utilized in the proposed point cloud inpainting.

Iii-a Preliminary

Fig. 2: A -NN graph constructed when = 4. The connections of boundary vertices are omitted.

We consider an undirected graph composed of a vertex set of cardinality , an edge set connecting vertices, and a weighted adjacency matrix . is a real symmetric matrix, where is the weight assigned to the edge connecting vertices and . We assume non-negative weights, i.e., . For example, -Nearest Neighbor (-NN) graph is a common undirected graph, which is constructed by connecting each point with its nearest neighbors, as shown in Fig. 2.

The Laplacian matrix, defined from the adjacency matrix, can be used to uncover many useful properties of a graph [2]. Among different variants of Laplacian matrices, the combinatorial graph Laplacian used in [39, 40, 41] is defined as , where is the degree matrix—a diagonal matrix where . We choose to adopt the combinatorial graph Laplacian in our problem formulation, because it provides energy compaction in low-frequency components when integrated into the graph-signal smoothness prior, as discussed next. This property enables the structure-adaptive smoothing of the transition from the known region to the inpainted region.

Iii-B Graph-Signal Smoothness Prior and Its Frequency Interpretation

Graph signal refers to data residing on the vertices of a graph. For example, if we construct a -NN graph on the point cloud, then the normal of each point can be treated as graph signal defined on the -NN graph. This will be discussed further in the proposed cubic matching approach in Section VII-B.

A graph signal defined on a graph is smooth with respect to the topology of if


where is a small positive scalar, and denotes two vertices and are one-hop neighbors in the graph. In order to satisfy (1), and have to be similar for a large edge weight , and could be quite different for a small . Hence, (1) enforces to adapt to the topology of , which is thus coined graph-signal smoothness prior. As [42], (1) is concisely written as in the sequel.

Despite the smoothing property, this prior also has denoising functionality. By definition of eigenvectors we have


where is the matrix of the eigenvectors of , is the matrix of eigenvalues of with . Because is a real symmetric matrix, . Then we have


The eigenvectors are then used to define the graph Fourier transform (GFT). Formally, for any signal

residing on the vertices of , its GFT is defined in[43] as


and the inverse GFT follows as


Therefore, with (4) we have


Hence, (1) can be also written as . In order to satisfy this, with larger will be smaller when minimizing (1). Hence the term in (5) with smaller will be suppressed. Since the frequency of , , is positively correlated with the eigenvalue , which is denoted as and will be proved in Section III-C, the graph-signal smoothness prior suppresses high-frequency components, thus leading to a compact representation of .

This prior will be deployed in our problem formulation of point cloud inpainting as a regularization term, as discussed in Section VII.

Iii-C Proof of Frequency Interpretation in Graph Nodal Domain

While it is known that a larger eigenvalue of the graph Laplacian corresponds to an eigenvector with higher frequency in general, there is no rigorous mathematical proof yet to the best of our knowledge. Hence, we prove it from the perspective of nodal domains [44] and have the following theorem:

Theorem    Regarding the graph Laplacian, there is a positive correlation between eigenvalues and the frequencies of eigenvectors, by defining frequency as the number of zero crossings in graph nodal domains in particular.

Proof    A positive (negative) strong nodal domain of the eigenvector corresponding to on is a maximally connected subgraph such that for all vertices in the subgraph. A positive (negative) weak nodal domain of on is a maximally connected subgraph such that for all vertices in the subgraph, with for at least one vertex in the subgraph. As we know, the number of zero crossings in the signal can represent its frequency. Similarly, the number of zero crossings in nodal domains of the eigenvectors is able to interpret the frequency, as shown in Fig. 3.

Fig. 3: Two 1D examples (row 1 and row 2) of the relationship between the number of zero crossings in graph nodal domains of the eigenvectors (a) (c) and their corresponding frequencies (b) (d), in which the horizontal axis is the index in the eigenvector and the vertical axis is the eigenvector value on each index. (a) has 1 zero crossing while (c) has 2, thus (a) corresponds to lower frequency than (c), as demonstrated in (b) and (d).

In order to prove the positive correlation between the eigenvalue and the frequency of the corresponding eigenvector , , we introduce the upper and lower bounds of the number of nodal domains of , :

  • Upper bound: ( is the multiplicity of ) for strong nodal domains and for weak nodal domains [44].

  • Lower bound: ( is the minimal number of edges that need to be removed from in order to turn it into a forest, is the number of zero in ) for strong nodal domains [45].

Therefore, both the upper and lower bounds of are monotonic in . This means that is established in general, and thus , because is monotonic in the number of zero crossings in nodal domains and is arranged monotonically with respect to .

Iv Overview of the Proposed Inpainting Framework

Fig. 4: The framework of the proposed point cloud inpainting method.

We now introduce the proposed point cloud inpainting method based on the spectral graph theory in Section III. As shown in Fig. 4, the method consists of the following steps:

  • Firstly, we voxelize the input point clouds before inpainting to deal with the irregularity of point clouds.

  • Secondly, we detect the holes in the voxelized point cloud to inpaint in the next step.

  • Thirdly, we inpaint the missing area. (a) We split it into cubes of fixed size as units to be processed in the subsequent steps. (b) We choose the cube with missing data as the target cube. (c) We search for the most similar cube to the target cube in the point cloud, which is referred to as the source cube, based on the DC and AGTV in the normals of points. (d) The inpainting problem is formulated into an optimization problem, which leverages the source cube and the graph-signal smoothness prior. The closed-form solution of the optimization problem leads to the resulting cube.

  • Finally, we replace the target cube with the resulting cube as the output.

V Voxelization

In order to address the irregularity of point clouds, we firstly voxelize the point clouds before inpainting, which means we perturb each point in the point cloud to regular grids, as shown in Fig. 5. The voxelization step has the following advantages:

  • Helps delineate missing areas accurately.

  • Facilitates the determination of the exact locations of points to be filled in.

  • Reduces the computational complexity caused by the irregularity of the point cloud.

Fig. 5: Demonstration of the proposed voxelization for point clouds. (a) The original point cloud. (b) The voxelized result.

The proposed voxelization consists of two steps. Firstly, in order to distribute the point uniformly in the voxels, we preprocess the coordinates to make the distance between each two neighboring points close to and translate the point cloud to the origin. Then we voxelize them and compute the attribute information for each voxelized point.

V-a Coordinate Preprocessing

For the convenience of the subsequent processing and data storage, we convert the coordinates into integers. We first expand or contract the original coordinates to make the distance between each two neighboring points close to . The input data is a point cloud denoted by ( is the number of points in the original point cloud) as the location of the points and as the attribute information of the points. We then construct a -NN graph mentioned in Section III-A over the original point cloud, and compute the average of the distance between connected vertices as . The coordinates are then converted to :


Next, we translate

to the origin of the coordinate axis for simplicity. The translation vector

is computed as , where represents the coordinates in . and are defined in the same way. Then we obtain the -translated location as


where is a translation matrix with repeated in each row.

V-B Voxel Computation

We split into voxels with side length equal to . Thus a voxel consists of a set of points with meaning the coordinates of the -th point in the voxel, each of which corresponds to the attribute information denoted as . In each voxel, we replace the set of points with the centering point of the voxel as long as there are points within, thus obtaining the voxelized point cloud . Then we compute the attribute value for each voxel with the centering location as


where is the weight for the attribute of each point based on the geometric distance to the centering point :


where is a weighting parameter (empirically in our experiments). This is based on the assumption that geometrically closer points have more similar attributes in general.

The voxelized point cloud, with as its locations and as its attributes (normal at each point in our case), is the final input data for the subsequent processing.

Vi Automatic Hole Detection

We propose hole detection mainly based on BFS [46]. As shown in Fig. 6, we first project the 3D point cloud to a 2D depth map along a projection direction denoted by . While can be all the six projection directions along

axes, we deploy Principal Component Analysis (PCA) 

[47] on the normals to choose the best projection direction as in order to save the computational complexity. PCA statistically transforms a set of variables that may be related to each other into a set of linearly uncorrelated variables by orthogonal transformation. The converted set of variables is the principal component, which is the projection direction in our method. Then we search for holes using BFS, and differentiate holes by marking points in each hole with different numbers. Finally, we determine the length of each detected hole along by calculating the coordinate range of each hole along , which gives the hole region in the 3D point cloud. We elaborate on the BFS-based detection method as follows.

Fig. 6: The projection step for point clouds for automatic hole detection. (a) The original point cloud. (b) The depth map obtained from the principle projection direction.

There are two kinds of pixels in the obtained depth map: pixels corresponding to points in the point cloud and pixels with no correspondences (i.e., corresponding to holes). Hence, we propose to detect a hole in the depth map by searching a connected set of pixels with no correspondences, where two pixels are considered connected if they have similar values and locate in one-hop neighboring locations in horizontal, vertical or diagonal directions. We solve this problem by BFS as shown in Fig. 7. We first mark pixels with correspondence as and the other pixels as . Then we traverse pixels in the depth map until we find the first -marked pixel, and change the mark to for the first hole (similarly, for the second hole, for the third hole, etc.) so as to distinguish different holes. Next, we search for the connected -marked pixels among its one-hop neighbors, and update their marks to . We repeat this operation until there are no -marked neighbors of the -marked pixels. Thus the first hole is detected. Similarly, we continue to detect the second hole by searching the next -marked pixel in the depth map, and change its mark to . All the -marked pixels in the neighborhood are then updated to . The same goes for the rest of the holes.

Now we detect all the holes based on BFS. However, some of them only have several pixels in the depth map, which might be detected due to the low density of the point cloud. Therefore, we filter out the holes with the number of missing pixels less than a density-dependent threshold (which is empirically set as 4 because the distance between two adjacent points after voxelization is less than 2 generally). Besides, some detected holes are actually the pixels corresponding to the outside of the object. These holes are distinguished from actual holes by checking if they include any point at the boundary of the depth map and then filtered out. The rest of holes will be inpainted.

Fig. 7: The marking step for hole detection using BFS-based method. (a) The initial marking situation around the hole at the beginning of the detection, where the first -marked pixel is marked in red. (b) The final marking situation around the hole at the end of the detection, where the first -marked pixel is marked in red, the second in pink, the third in blue, the fourth in green and the fifth in yellow.

Vii The Proposed Inpainting Method

We now introduce the proposed point cloud inpainting method based on the spectral graph theory in Section III. As shown in Fig. 4, the input data is a point cloud denoted by with meaning the coordinates of the -th point in the point cloud. Firstly, we split it into fixed-sized cubes as units to be processed in the subsequent steps. Secondly, we choose the target cube with missing area detected in Section VI. Thirdly, we search for the most similar cube to the target cube in , which is referred to as the source cube, based on the DC and AGTV in the normals of points. Fourthly, the inpainting problem is formulated into an optimization problem, which leverages the source cube and the graph-signal smoothness prior. The closed-form solution of the optimization problem gives the resulting cube. Finally, we replace the target cube with the resulting cube as the output.

Vii-a Preprocessing

We first split the input point cloud into overlapping cubes with ( is the size of the cube), as the processing unit of the proposed inpainting algorithm. is empirically set according to the coordinate range of ( in our experiments), while the overlapping step is empirically set as . This is a trade-off between the computational complexity and ensuring enough geometry information available to search for the most similar cube.

Having obtained the cubes, we choose the cube with missing data as the target cube . Further, in order to save the computation complexity and increase the accuracy of the subsequent cube matching, we choose candidate cubes by filtering out cubes with the number of points less than 80% of that of , and augment the candidates by mirroring these cubes with respect to the x-y plane, which will be used in the next step.

Vii-B Non-Local Similar Cube Searching

In order to search for the most similar cube to the target, we first define the geometric similarity metric between the target cube and each candidate cube as


where and are the difference in DC and AGTV between and respectively. Specifically, they are defined as


where and are the DC of cube and , while and are the AGTV of and . We explain and in detail as follows.

Direct Component   This is a function of the normals of points in the cube, which presents the prominent geometry direction of the cube. A cube consists of a set of points ( is the number of the points), each of which corresponds to a normal, denoted as . We compute the DC of the cube as


Anisotropic Graph Total Variation   Unlike traditional total variation used in image denoising [48] and edge extraction [49], graph total variation (GTV) generalizes to describe the smoothness of a graph signal with respect to the underlying graph structure. As we have the local gradient of a graph signal at node whose -th element is defined as


the conventional isotropic GTV [17] is defined as


Further, in order to describe the geometric feature of point clouds more accurately, we deploy the AGTV based on the normals in a cube instead of the conventional graph total variation. The mathematical definition of the AGTV in our method for is


Here denotes the weight of the edge between and in the graph we construct over . Specifically, we choose to build a -NN graph mentioned in Section III-A, based on the affinity of geometric distance among points in . Also, we consider unweighted graphs for simplicity, which means is assigned as


Note that the AGTV is the norm of the overall difference in normals in the graph. Hence, this favors the sparsity of the overall graph gradient in normals while the conventional GTV favors the sparsity of the local gradients. Therefore, the AGTV is able to describe abrupt signal changes efficiently.

Having computed the similarity metric in (11) between the target cube and candidate cubes, we choose the candidate cube with the largest similarity as the source cube . However, cannot be directly adopted for inpainting, because it is just the most similar to in the geometric structure, but not in the relative location in the cube. Hence, we further perform structure matching (i.e., coarse registration) for and so as to match the relative locations, as discussed in the following.

Vii-C Structure Matching

We consider matching and using both translation and rotation, as shown in Fig. 8. Firstly we translate by the difference in location between and . The difference in each dimension, denoted as , is computed as


where is a diagonal matrix, extracting the boundary within one hop to the missing region in the cube, and is the number of the boundary points. and are defined in the same way as (19). Thus we obtain the -translated cube as


where is a translation matrix with repeated in each row.

Fig. 8: The structure matching for the source cube. (a) The original cube. (b) The translated cube. (c) The translated and rotated cube.

Then we compute the rotation matrix for using the proposed simplified ICP method, which calculates the rotation matrix once and for all. The rotation matrix is obtained by the quaternion of the covariance matrix between two cubes. Firstly, we choose three pairs of points in and as the control points for registration. We randomly select three points in as , and then find the closest point in the relative position for each of them in as . Secondly, we compute the covariance matrix between and as


where is a matrix:


Then the unit eigenvector corresponding to the maximum eigenvalue of the matrix is adopted as the optimal rotation. Finally the rotation matrix corresponding to is computed as in [19]:


Since , we have . Hence, the rotated cube is finally computed by


Rotation might lead to points on irregular grids. In order to maintain the voxelized unity of the point cloud, we further voxelize as in Section V. Also, points outside the cube due to the rotation are removed. This leads to the final reference cube, denoted as , which will be adopted in the following inpainting step.

Vii-D Problem Formulation

Next, we cast the inpainting problem as an optimization problem, which is regularized with the graph-signal smoothness prior as mentioned in Section III-A. This problem is formulated as


where is the desired cube. is a diagonal matrix, extracting the missing region in and , while is a diagonal matrix, extracting the known region. and are two weighting parameters (we empirically set and in the experiments). Besides, is the graph Laplacian matrix of , which is computed from a -NN graph we construct in in the same way as in (18).

The first term in (25) is a fidelity term, which ensures the desired cube to be close to in the known region. The second term constraints the unknown region of to be similar to that of . Further, the third term is the graph-signal smoothness prior, which enforces the structure of to be smooth with respect to the graph constructed over . In other words, the third term aims to make the structure of mimic that of . Besides, it also has the functionality of denoising the cube because the energy of the cube is compacted in low-frequency components as analyzed in Section III-B.

(25) is a quadratic programming problem. Taking derivative of (25) with respect to , we have the closed-form solution:


(25) is thus solved optimally and efficiently. Finally, we replace the target cube with the resulting cube, which serves as the output.

Viii Experiments

Viii-a Experimental Setup

We evaluate the proposed method by testing on several point cloud datasets from Microsoft [50], Mitsubishi [51], Stanford [52], Yobi [53], JPEG Pleno [54] and Visual Computing Lab [55]. We test on two types of holes: 1) real holes generated during the capturing process, which have no ground truth; 2) synthetic holes on point clouds so as to compare with the ground truth. In particular, the number of nearest neighbors is considered to be related to , the number of existing points in the cube. Empirically, in our experiments.

Further, we compare our method with four competing algorithms for 3D geometry inpainting, including Meshfix [56], Meshlab [57][35] and [36]. Note that Meshfix and Meshlab are based on meshes, so we convert point clouds to meshes via Meshlab prior to testing the algorithms, and then convert the inpainted meshes back to point clouds as the final output.

Viii-B Experimental Results

It is nontrivial to measure the geometry difference of point clouds objectively. We apply the geometric distortion metrics in [58] and [34], referred to as GPSNR and NSHD respectively, as the metric for evaluation.

(a) Original
(b) Meshfix
(c) Meshlab
(d) [35]
(e) [36]
(f) Proposed
Fig. 9: Inpainting from different methods for Phili (row 1), Bimba (row 2), Bunny (row 3) with real holes marked in red circles, with one representative cube magnified.
(a) Hole
(b) Meshlab
(c) [35]
(d) [36]
(e) Proposed
(f) Ground Truth
Fig. 10: Inpainting from different methods for Sarah (row 1), House (row 2) with synthetic holes marked in red circles, with one representative cube magnified.

GPSNR measures the errors between point cloud and based on optimized PSNR:


where is the intrinsic resolution of the point clouds and is the point-to-plane distance. Thus the higher GPSNR is, the smaller the gap between and is. Note that GPSNR could be negative, depending on the assignment of the peak value [58].

NSHD (Normalized Symmetric Hausdorff Distance) is a normalized metric based on One-sided Hausdorff Distance [59]:


where is the volume of the smallest axes-aligned parallelepiped enclosing the given 3D point cloud, and is the One-sided Hausdorff Distance between point cloud and :


Thus the lower NSHD is, the smaller the difference between and is.

Table I and Table II show the objective results for synthetic holes in GPSNR and NSHD respectively. We see that our scheme outperforms all the competing methods in GPSNR and NSHD significantly. Specifically, in Table I we achieve 49.14 dB gain in GPSNR on average over Meshfix, 31.96 dB over Meshlab, 27.63 dB over [35], and 11.58 dB over [36]. In Table II, we produce much lower NSHD than the other methods, at least four times lower compared to the next best method, [36]. Note that, holes are synthesized so that we have the ground truth for the objective comparison.

Further, Fig. 9 and Fig. 10 demonstrate subjective inpainting results for real holes and synthetic holes respectively. The results of Meshfix and Meshlab exhibit non-uniform density, because they sometimes generate redundant mesh faces to fill wrong holes selected by their automatic hole detection algorithms. For the real holes in Fig. 9 (a), which are fragmentary, Meshfix and Meshlab are able to fill small holes with planar geometry structure, but it cannot fully inpaint large missing regions. Besides, [35] and [36] fill most of the holes by connecting the boundary with simple structure and point density different from the original cloud. Our results shown in Fig. 9 (f) demonstrate that our method is able to inpaint both small holes and large holes with appropriate geometry structure and point density, even for holes with complicated geometry. This gives credits to the use of the non-local self-similarity and the graph-signal smoothness prior.

Meshfix Meshlab [35] [36] Proposed
Andrew 3.5201 25.5331 12.1282 46.1208 56.3579
David -10.7156 13.7398 27.0006 35.8101 44.1576
House -5.2012 17.2414 24.5564 35.1124 54.1287
Longdress 9.2472 -0.8588 19.8245 37.1343 50.9951
Phili 0.9130 25.3133 31.5649 46.7532 57.4127
Ricardo -1.4875 15.1558 20.2132 31.4574 44.1145
Sarah -3.1453 21.3919 13.7822 35.8166 46.1248
Shiva 15.3988 28.4587 31.5300 40.8340 48.3794
TABLE I: Performance Comparison in GPSNR (dB)
Meshfix Meshlab [35] [36] Proposed
Andrew 10.5931 16.7684 7.6424 2.7865 0.6759
David 11.4641 7.5213 3.5741 4.0126 0.6285
House 13.2186 19.2414 11.1300 3.1533 0.6109
Longdress 14.6943 18.1654 2.7877 2.3145 0.3889
Phili 18.4443 21.1001 13.4530 6.1444 1.4678
Ricardo 3.9157 4.5453 1.1547 1.0548 0.2422
Sarah 11.7554 6.9805 7.8014 3.7259 0.4770
Shiva 8.4811 13.4852 9.6078 1.6446 0.2960
TABLE II: Performance Comparison in NHSD ()

In Fig. 10, we synthesize two holes both in Sarah and House, with simple and complex geometric structure respectively. We observe that Meshlab cannot fill the holes completely, while [35] introduces wrong geometry around the holes since it attempts to connect the boundary of the hole region using a simple plane. The results of [36] reveal its drawbacks: it fills the hole by shrinking the hole circularly along the boundary of the hole. Thus the results always depend on the shape of the hole and the geometric trend around the hole. In comparison, our results shown in Fig. 10 (e) are inpainted from the nonlocal cubes with the most similar structure, which are almost the same as the ground truth in Fig. 10 (f).

Further, we demonstrate the other hole in House which is marked in the red circle in Fig. 10 from a different view, as in Fig. 11. We see that there remains some noise in the ground truth due to the acquisition equipment, while our inpainting result attenuates the noise obviously. This additional denoising effect comes from the property of the graph-signal smoothness prior.

Fig. 11: Denosing effect of another synthetic hole in House. (a) The ground truth. (b) The result obtained using the proposed method.

Ix Conclusion

We exploit the local frequency interpretation and non-local self-similarity on graph for point cloud inpainting. Firstly, by defining frequency on graph as the number of zero crossings in graph nodal domains in particular, we prove the positive correlation between eigenvalues and the frequencies of eigenvectors for the graph Laplacian. Based on this frequency interpretation, we elaborate on the smoothing and denoising functionality of the graph-signal smoothness prior. Then we propose an efficient 3D point cloud inpainting approach leveraging on this prior, where the key observation is that point clouds exhibit non-local self-similarity in geometry. Specifically, we adopt fixed-sized cubes as the processing unit, and search for the most similar cube to the target cube which contains holes. The similarity metric is based on the direct component and the anisotropic graph total variation of normals in the cubes. We then cast the hole-filling problem as a quadratic programming problem, based on the selected most similar cube and regularized by the graph-signal smoothness prior, which also performs denoising along with inpainting. Besides, we propose voxelization and automatic hole detection for point clouds prior to inpainting. Experimental results show that our algorithm outperforms four competing methods significantly. In the future, we consider the extension to the inpainting of the color attribute of point clouds jointly with the geometry.


  • [1]

    D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst, “The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains,”

    IEEE Signal Processing Magazine, vol. 30, no. 3, pp. 83–98, 2013.
  • [2] F. K. Chung, “Spectral graph theory,” vol. 92, no. 6, p. 212, 1996.
  • [3] S. Sardellitti, S. Barbarossa, and P. D. Lorenzo, “On the graph Fourier transform for directed graphs,” IEEE Journal of Selected Topics in Signal Processing, vol. 11, no. 6, pp. 796–811, 2017.
  • [4] A. Ortega, P. Frossard, J. Kovačević, J. M. F. Moura, and P. Vandergheynst, “Graph signal processing,” arXiv:1712.00468v1, 2017.
  • [5] A. Buades, B. Coll, and J. M. Morel, “A non-local algorithm for image denoising,” in

    IEEE International Conference on Computer Vision and Pattern Recognition

    , vol. 2, 2005, pp. 60–65.
  • [6] D. Glew, “Self-similarity of images and non-local image processing,” 2011.
  • [7]

    M. Bertalmio, L. Vese, G. Sapiro, and S. Osher, “Simultaneous structure and texture image inpainting,”

    IEEE Transactions on Image Processing, vol. 12, no. 8, pp. 882–889, 2003.
  • [8] A. Criminisi, P. Rez, and K. Toyama, Region filling and object removal by exemplar-based image inpainting.   IEEE Press, 2004.
  • [9] I. Daribo, T. Maugey, G. Cheung, and P. Frossard, “R-D optimized auxiliary information for inpainting-based view synthesis,” in 3DTV-Conference: the True Vision - Capture, Transmission and Display of 3D Video, vol. 24, no. 4, 2012, pp. 1–4.
  • [10] H. Yang and X. Hou, Texture segmentation using image decomposition and local self-similarity of different features.   Kluwer Academic Publishers, 2015.
  • [11] S. H. El-Din and M. Moniri, “Fragile and semi-fragile image authentication based on image self-similarity,” in International Conference on Image Processing. 2002. Proceedings, vol. 2, no. 2, 2002, pp. 897–900.
  • [12] Z. H. Qi, L. Li, Z. M. Zhang, and X. Q. Qi, “Self-similarity analysis of eubacteria genome based on weighted graph,” Journal of Theoretical Biology, vol. 280, no. 1, pp. 10–18, 2011.
  • [13] J. Pang, G. Cheung, W. Hu, and O. Au, “Redefining self-similarity in natural images for denoising using graph signal gradient,” in Signal and Information Processing Association Summit and Conference, 2014, pp. 1–8.
  • [14] C. Tulvan, R. Mekuria, and Z. Li, “Use cases for point cloud compression (pcc),” in ISO/IEC JTC1/SC29/WG11 (MPEG) output document N16331, June 2016.
  • [15] Z. Fu, W. Hu, and Z. Guo, “Point cloud inpainting on graphs from non-local self-similarity,” in IEEE International Conference on Image Processing, Athens, Greece, 2018.
  • [16] N. Shahid, N. Perraudin, V. Kalofolias, B. Ricaud, and P. Vandergheynst, “PCA using graph total variation,” in IEEE International Conference on Acoustics, Speech and Signal Processing, 2016.
  • [17] P. Berger, G. Hannak, and G. Matz, “Graph signal recovery via primal-dual algorithms for total variation minimization,” IEEE Journal of Selected Topics in Signal Processing, vol. PP, no. 99, pp. 1–1, 2017.
  • [18] Y. Bai, G. Cheung, X. Liu, and W. Gao, “Blind image deblurring via reweighted graph total variation,” in IEEE International Conference on Acoustics, Speech and Signal Processing, Calgary, Alberta, Canada, 2017.
  • [19] P. J. Besl and N. D. Mckay, “A method for registration of 3-D shapes,” IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 14, no. 2, pp. 239–256, 2002.
  • [20] D. Chetverikov, D. Svirko, D. Stepanov, and P. Krsek, “The trimmed iterative closest point algorithm,” in International Conference on Pattern Recognition, 2002. Proceedings, vol. 3, 2002, pp. 545–548.
  • [21] A. Emelyanov and V. Skala, “Surface reconstruction from problem point clouds,” 2002 International Conference Graphic, pp. 31–40, 2002.
  • [22] X. Orriols and X. Binefa, “Finding breaking curves in 3D surfaces,” The 1st Iberian Conference on Pattern Recognition and Image Analysis, vol. 2652, pp. 681–688, 2003.
  • [23] M. S. Floater and M. Reimers, “Meshless parameterization and surface reconstruction,” Computer Aided Geometric Design, vol. 18, no. 2, pp. 77–92, 2001.
  • [24] V. S. Nguyen, T. H. Trinh, and M. H. Tran, “Hole boundary detection of a surface of 3D point clouds,” in International Conference on Advanced Computing and Applications, 2015, pp. 124–129.
  • [25] N. H. Aldeeb and O. Hellwich, “Detection and classification of holes in point clouds,” 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, pp. 321–330, 2017.
  • [26] S. Chen, D. Tian, C. Feng, A. Vetro, and J. Kovačević, “Contour-enhanced resampling of 3D point clouds via graphs,” in IEEE International Conference on Acoustics, Speech and Signal Processing, 2017, pp. 2941–2945.
  • [27] G. H. Bendels, R. Schnabel, and R. Klein, “Detecting holes in point set surfaces,” Journal of Wscg, vol. 14, 2006.
  • [28] X. He, Y. Zhuo, and M. Pang, “An algorithm for extracting hole-boundary from point clouds,” Transactions of the Chinese Society for Agricultural Machinery, vol. 45, no. 2, pp. 291–296, 2014.
  • [29] X. Wu and W. Chen, “A scattered point set hole-filling method based on boundary extension and convergence,” in Intelligent Control and Automation, 2015, pp. 5329–5334.
  • [30] P. Sahay and A. N. Rajagopalan, “Harnessing self-similarity for reconstruction of large missing regions in 3D models,” in International Conference on Pattern Recognition, 2012, pp. 101–104.
  • [31] ——, “Geometric inpainting of 3D structures,” in Computer Vision and Pattern Recognition Workshops, 2015, pp. 1–7.
  • [32] S. Shankar, S. A. Ganihar, and U. Mudenagudi, “Framework for 3D object hole filling,” in Fifth National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics, 2015, pp. 1–4.
  • [33] F. Lozes, A. Elmoataz, and O. Lézoray, “PDE-based graph signal processing for 3-D color point clouds : opportunities for cultural heritage,” IEEE Signal Processing Magazine, vol. 32, no. 4, pp. 103–111, 2015.
  • [34] C. Dinesh, I. V. Bajic, and G. Cheung, “Exemplar-based framework for 3D point cloud hole filling,” in IEEE International Conference on Visual Communications and Image Processing, May 2017.
  • [35] J. Wang, M. M. Oliveira, M. Garr, and M. Levoy, “Filling holes on locally smooth surfaces reconstructed from point clouds,” Image & Vision Computing, vol. 25, no. 1, pp. 103–113, 2007.
  • [36] F. Lozes, A. Elmoataz, and O. Lézoray, “Partial difference operators on weighted graphs for image processing on surfaces and point clouds,” IEEE Transactions on Image Processing, vol. 23, no. 9, pp. 3896–909, 2014.
  • [37] H. Lin and W. Wang, “Feature preserving holes filling of scattered point cloud based on tensor voting,” in IEEE International Conference on Signal and Image Processing, 2017, pp. 402–406.
  • [38] S. Friedman and I. Stamos, “Online facade reconstruction from dominant frequencies in structured point clouds,” in Computer Vision and Pattern Recognition Workshops, 2012, pp. 1–8.
  • [39] G. Shen, W.-S. Kim, S. K. Narang, A. Ortega, J. Lee, and H. Wey, “Edge-adaptive transforms for efficient depth map coding,” in IEEE Picture Coding Symposium, Nagoya, Japan, December 2010, pp. 566–569.
  • [40] W. Hu, G. Cheung, X. Li, and O. C. Au, “Depth map compression using multi-resolution graph-based transform for depth-image-based rendering,” in IEEE International Conference on Image Processing, Orlando, FL, September 2012, pp. 1297–1300.
  • [41] W. Hu, G. Cheung, A. Ortega, and O. C. Au, “Multi-resolution graph fourier transform for compression of piecewise smooth images,” in IEEE Transactions on Image Processing, vol. 24, no. 1, January 2015, pp. 419–33.
  • [42] D. A. Spielman, “Lecture 2 of spectral graph theory and its applications,” September 2004.
  • [43] D. K. Hammond, P. Vandergheynst, and R. Gribonval, “Wavelets on graphs via spectral graph theory,” Applied & Computational Harmonic Analysis, vol. 30, no. 2, pp. 129–150, 2009.
  • [44] E. Briandavies, G. L. Gladwell, J. Leydold, and P. F. Stadler, “Discrete nodal domain theorems,” Linear Algebra & Its Applications, vol. 336, no. 1, pp. 51–60, 2001.
  • [45] H. Xu and S. T. Yau, “Nodal domain and eigenvalue multiplicity of graphs,” Journal of Combinatorics, vol. 3, no. 4, 2013.
  • [46] A. Bundy and L. Wallen, Breadth-first search.   Springer Berlin Heidelberg, 1984.
  • [47] H. Abdi and L. J. Williams, “Principal component analysis,” Wiley Interdisciplinary Reviews Computational Statistics, vol. 2, no. 4, pp. 433–459, 2010.
  • [48] A. Chambolle, An algorithm for total variation minimization and applications.   Kluwer Academic Publishers, 2004.
  • [49] D. Khabipova, J. P. Marques, G. Puy, R. Gruetter, and Y. Wiaux, “The importance of priors for l2 regularization and total variation methods in quantitative susceptibility mapping,” in 29th Annual Meeting of the European Society for Magnetic Resonance in Medicine and Biology, Lisbon, Portugal, 2012.
  • [50] C. Loop, Q. Cai, S. O. Escolano, and P. A. Chou, “Microsoft voxelized upper bodies - a voxelized point cloud dataset,” in ISO/IEC JTC1/SC29 Joint WG11/WG1 (MPEG/JPEG) input document m38673/M72012, May 2016.
  • [51] R. Cohen, H. Ochimizu, D. Tian, and A. Vetro, “Mobile mapping system point cloud data from mitsubishi electric,” in ISO/IEC JTC1/SC29/WG11 (MPEG2017) input document M40495, April 2017.
  • [52] http://graphics.stanford.edu/data/3{D}scanrep/.
  • [53] https://www.yobi3d.com/.
  • [54] https://jpeg.org/plenodb/.
  • [55] http://vcl.iti.gr/dataset/reconstruction/.
  • [56] M. Attene, “A lightweight approach to repairing digitized polygon meshes,” The Visual Computer, vol. 26, no. 11, pp. 1393–1406, 2010.
  • [57] P. Cignoni, M. Callieri, M. Corsini, M. Dellepiane, F. Ganovelli, and G. Ranzuglia, “Meshlab: an open-source mesh processing tool,” in Eurographics Italian Chapter Conference, S. Vittorio, D. C. Rosario, and F. Ugo, Eds.   The Eurographics Association, 2008.
  • [58] D. Tian, H. Ochimizu, C. Feng, R. Cohen, and A. Vetro, “Geometric distortion metrics for point cloud compression,” in IEEE International Conference on Image Processing, Beijing, China, September 2017.
  • [59] D. P. Huttenlocher, G. A. Klanderman, and W. J. Rucklidge, “Comparing images using the hausdorff distance,” IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 15, no. 9, pp. 850–863, 1993.