Local Frequency Interpretation and Non-Local Self-Similarity on Graph for Point Cloud Inpainting

09/28/2018 ∙ by Zeqing Fu, et al. ∙ Peking University 0

As 3D scanning devices and depth sensors mature, point clouds have attracted increasing attention as a format for 3D object representation, with applications in various fields such as tele-presence, navigation and heritage reconstruction. However, point clouds usually exhibit holes of missing data, mainly due to the limitation of acquisition techniques and complicated structure. Further, point clouds are defined on irregular non-Euclidean domains, which is challenging to address especially with conventional signal processing tools. Hence, leveraging on recent advances in graph signal processing, we propose an efficient point cloud inpainting method, exploiting both the local smoothness and the non-local self-similarity in point clouds. Specifically, we first propose a frequency interpretation in graph nodal domain, based on which we introduce the local graph-signal smoothness prior in order to describe the local smoothness of point clouds. Secondly, we explore the characteristics of non-local self-similarity, by globally searching for the most similar area to the missing region. The similarity metric between two areas is defined based on the direct component and the anisotropic graph total variation of normals in each area. Finally, we formulate the hole-filling step as an optimization problem based on the selected most similar area and regularized by the graph-signal smoothness prior. Besides, we propose voxelization and automatic hole detection methods for the point cloud prior to inpainting. Experimental results show that the proposed approach outperforms four competing methods significantly, both in objective and subjective quality.



There are no comments yet.


This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Graphs are generic data representation forms for the description of pairwise relations in the data, which consist of vertices and connecting edges. The weight associated with each edge in the graph often represents the similarity between the two vertices it connects. Nowadays, graphs have been used to model various types of relations in physical, biological, social and information systems because of advantages in consuming irregular discrete signals. The connectivities and edge weights are either dictated by the physics (e.g. physical distance between nodes ) or inferred from the data. The data residing on these graphs is a finite collection of samples, with one sample at each vertex in the graph, which is referred to as graph signal [1]

. Compared with conventional signals (e.g., images and videos) defined on regular grids, graph signals are often defined on non-Euclidean domains where the neighborhood is unclear and the order might be unimportant. Statistical properties such as stationarity and compositionality are even not well defined for graph signals. Such kinds of data arise in diverse engineering and science fields, such as logistics data in transportation networks, functionalities in neural networks, temperatures on sensor networks, pixel intensities in images, etc.

For any undirected graph, there is an orthogonal basis corresponding to the eigenvectors of a graph matrix

111In spectral graph theory [2], a graph matrix is a matrix describing the connectivities in the graph, such as the commonly used graph Laplacian matrix, which will be introduced in Section III.. Thus once a graph has been defined, a definition of frequency is readily available, which allows us to extend conventional signal processing to graph signals. Multiple choices are possible, such as the cut size of the constructed orthogonal basis set [3] and the total variation of the associated spectral component [4]. These methods are able to accurately reflect the relationship among the frequencies of different eigenvectors in certain kinds of graph, but they are inapplicable to other graphs. Hence, it still remains an open question to choose a local frequency interpretation appropriately for a given application, such as inpainting in our context, which is actively being investigated.

As the local information is often not enough for applications like inpainting, we also resort to the wisdom of non-local self-similarity [5]. This idea is under the assumption of non-local stationarity, and has been widely used in image processing such as image denoising, inpainting, texture segmentation and authentication [6, 7, 8, 9, 10, 11]. However, non-local self-similarity on graphs is challenging because it is difficult to define stationarity on irregular graphs. Qi et al. [12] introduce a weighted graph model to investigate the self-similarity characteristics of eubacteria genomes, but their graph signals are defined on regular Euclidean domains. Pang et al. [13] define self-similarity in images via regular graphs, but there is no such definition for high-dimensional and irregular signal yet.

Fig. 1: Point cloud Phili (a) with three self-similarity examples (b) (c) (d). (b) is an example with a hole due to the inherent limitations of the acquisition equipments. (c) (d) are non-locally self-similar regions to (b).

In particular, we propose to exploit both local frequency interpretation and non-local self-similarity on graph for point cloud inpainting. Point clouds have received increasing attention as an efficient representation of 3D objects. It consists of a set of points, each of which corresponds to a measurement point with the 3D coordinates representing the geometric information and possibly attribute information such as color and normal. The development of depth sensing and 3D laser scanning222Examples include Microsoft Kinect, Intel RealSense, LiDAR, etc. techniques enables the convenient acquisition of point clouds, thus catalyzing its applications in various fields, such as 3D immersive tele-presence, navigation in autonomous driving, and heritage reconstruction [14]. However, point clouds often exhibit several holes of missing data inevitably, as shown in Fig. 1 (b). This is mainly due to incomplete scanning views and inherent limitations of the acquisition equipments333For example, 3D laser scanning is less sensitive to the objects in dark colors. This is because the darker it is, the more wavelengths of light the surface absorbs and the less light it reflects. Thus the laser scanning devices are unable to receive enough reflected light for dark objects to recognize.. Besides, there may lack some regions in the data itself (e.g., dilapidated heritage). Therefore, it is necessary to inpaint the incomplete point clouds prior to subsequent applications.

Our proposed point cloud inpainting method has three main contributions. Firstly, we present a frequency interpretaion in graph nodal domain, in which we prove the positive correlation between eigenvalues and the frequencies of eigenvectors for the graph Laplacian. Based on this correlation, we discuss a graph-signal smoothness prior from the perspective of suppressing high-frequency components so as to smooth and denoise the inpainting area at the same time.

Secondly, we propose a point cloud inpainting method by exploiting the non-local self-similarity in the geometry of point clouds, leveraging on our previous work [15]. We globally search for a source region which is the most similar to the target region (i.e., the region containing the hole), and then fill the hole by formulating an optimization problem with the source region and the aforementioned graph-signal smoothness prior. In particular, we define the similarity metric between two regions on graph based on 1) the direct component (DC) of the normals of points and 2) the anistropic graph total variation (AGTV) of normals, which is an extension of the graph total variation [16, 17, 18]. Further, we improve our previous work by 1) mirroring the filtered cubes (the processing unit in our method) so as to augment non-local self-similar candidates; and 2) registering the geometric structure in the source cube and the target cube in order to strengthen the similarity between them for better inpainting, which includes the translation obtained by the difference in location between the source cube and the target cube and the rolation obtained by the proposed simplified Iterative Closest Points (ICP) method [19, 20].

Thirdly, we propose voxelization and automatic hole detection prior to point cloud inpainting. Specifically, we perturb each point in the point cloud to regular grids so as to facilitate the subsequent processing. Thereafter, we design a hole detection method in the voxelized point cloud based on Breadth-First Search (BFS). We project the 3D point cloud to a 2D depth map along the principle projection direction, search for holes in the 2D depth map via BFS, and finally obtain the 3D holes by calculating the coordinate range of the hole along the projection direction. Experimental results show that our approach outperforms four competing methods significantly, both in objective and subjective quality.

The outline of the paper is as follows. We first discuss previous methods in Section II. Then we propose our frequency interpretation in Section III and overview the proposed point cloud inpainting framework in Section IV. Next, we present our voxelization and hole detection methods in Section V and Section VI, respectively. In Section VII, we elaborate on the proposed inpainting approach, including preprocessing, non-local similar cube searching, structure matching and problem formulation. Finally, experimental results and conclusion are presented in Section VIII and IX, respectively.

Ii Related Work

We review the previous work on hole detection and inpainting respectively.

Ii-a Hole Detection

Existing hole detection methods for point clouds include mesh-based methods and point-based methods. The idea of mesh-based methods is to construct a mesh over the point cloud, extract the hole boundary by some greedy strategy and treat the point not surrounded by the triangle chains as the boundary feature point [21, 22, 23]. Nevertheless, the computational complexity is high, and the extracted boundaries are not accurate enough when the point distribution is uneven. Point-based methods are thus proposed to perform hole detection on the point cloud directly. [24, 25, 26] extract the exterior boundary of the surface based on the neighborhood relationship among the 3D points, and then detect the hole boundary by a growth function on the interior boundary of the surface. However, these methods are unable to distinguish the hole from closed edges. Bendels et al. [27]

compute a boundary probability for each point, and construct closed loops circumscribing the hole in a shortest-cost-path manner.

[28] and [29] detect the K-nearest neighbors of each point and calculate the connection between the neighbors. When the angle between two adjacent edges reaches a threshold, the point is regarded as a boundary point. Nevertheless, these methods ([27, 28, 29]) often treat points in sparse areas as the hole boundary, and the detected boundary points are likely to be incomplete and incoherent.

Ii-B Point Clouds Inpainting

Few point clouds inpainting methods have been proposed, which include two main categories according to the cause of holes: (1) restore holes in the object itself such as heritage and sculptures [30, 31, 32, 33, 34], and (2) inpaint holes caused by the limitation of scanning devices [35, 36, 37]. For the first class of methods, the main hole-filling data source is online database, as the holes are often large. Sahay et al. [30] attempt to fill big damage regions using neighbourhood surface geometry and geometric prior derived from registered similar examples in a library. Then they propose another method [31], which projects the point cloud to a depth map, searches a similar one from an online depth database via dictionary learning, and then adaptively propagates the local 3D surface smoothness. However, the projection process inevitably introduces geometric loss. Instead, Dinesh et al. [34] fill this kind of holes just with the data in the object itself. In particular, they give priority to the points along the hole boundary so as to determine the inpainting order, and find best matching regions based on an exemplar, which are used to fill the missing area. The results still suffer from geometric distortion due to the simple data source.

The other class of methods focus on holes generated due to the limitations of scanning devices. This kind of holes is smaller than the aforementioned ones in general, thus the information of the data itself is often enough for inpainting. Wang et al. [35]

create a triangle mesh from the input point cloud, identify the vicinity of the hole and interpolate the missing area based on a moving least squares approach. Lozes et al. 

[36] deploy partial difference operaters to solve an optimization equation on the point cloud, which only refers to the neighborhood of the hole to compute the geometric structure. Lin et al. [37]

segment a hole into several sub-holes, and then fill the sub-holes by extracting geometric features separately with tensor voting. Due to the reference information from the local neighborhood only, the results of these methods tend to be more planar than the ground truth. Also, artifacts are likely to occur around the boundary when the geometric structure is complicated.

In addition, some works deal with particular point cloud data. For instance, flattened bar-shaped holes in the human body data are studied in [29], where the holes are filled by calculating the extension direction of the boundary points from their neighboring points iteratively. [38] mainly focuses on geometrically regular and planar point clouds of buildings from obstructions such as trees to fill the holes, which searches non-local similar blocks by performing Fourier analysis on each scanline in real-time scanning to detect the periodicity of the plane with scene elements. These methods have higher data requirements, thus they are unsuitable for general cases.

Iii Frequency Interpretation in Graph Nodal Domain

We first provide a review on basic concepts in spectral graph theory, including graph, graph Laplacian and graph-signal smoothness prior, and then propose the frequency interpretation of the prior in graph nodal domain, which will be utilized in the proposed point cloud inpainting.

Iii-a Preliminary

Fig. 2: A -NN graph constructed when = 4. The connections of boundary vertices are omitted.

We consider an undirected graph composed of a vertex set of cardinality , an edge set connecting vertices, and a weighted adjacency matrix . is a real symmetric matrix, where is the weight assigned to the edge connecting vertices and . We assume non-negative weights, i.e., . For example, -Nearest Neighbor (-NN) graph is a common undirected graph, which is constructed by connecting each point with its nearest neighbors, as shown in Fig. 2.

The Laplacian matrix, defined from the adjacency matrix, can be used to uncover many useful properties of a graph [2]. Among different variants of Laplacian matrices, the combinatorial graph Laplacian used in [39, 40, 41] is defined as , where is the degree matrix—a diagonal matrix where . We choose to adopt the combinatorial graph Laplacian in our problem formulation, because it provides energy compaction in low-frequency components when integrated into the graph-signal smoothness prior, as discussed next. This property enables the structure-adaptive smoothing of the transition from the known region to the inpainted region.

Iii-B Graph-Signal Smoothness Prior and Its Frequency Interpretation

Graph signal refers to data residing on the vertices of a graph. For example, if we construct a -NN graph on the point cloud, then the normal of each point can be treated as graph signal defined on the -NN graph. This will be discussed further in the proposed cubic matching approach in Section VII-B.

A graph signal defined on a graph is smooth with respect to the topology of if


where is a small positive scalar, and denotes two vertices and are one-hop neighbors in the graph. In order to satisfy (1), and have to be similar for a large edge weight , and could be quite different for a small . Hence, (1) enforces to adapt to the topology of , which is thus coined graph-signal smoothness prior. As [42], (1) is concisely written as in the sequel.

Despite the smoothing property, this prior also has denoising functionality. By definition of eigenvectors we have


where is the matrix of the eigenvectors of , is the matrix of eigenvalues of with . Because is a real symmetric matrix, . Then we have


The eigenvectors are then used to define the graph Fourier transform (GFT). Formally, for any signal

residing on the vertices of , its GFT is defined in[43] as


and the inverse GFT follows as


Therefore, with (4) we have


Hence, (1) can be also written as . In order to satisfy this, with larger will be smaller when minimizing (1). Hence the term in (5) with smaller will be suppressed. Since the frequency of , , is positively correlated with the eigenvalue , which is denoted as and will be proved in Section III-C, the graph-signal smoothness prior suppresses high-frequency components, thus leading to a compact representation of .

This prior will be deployed in our problem formulation of point cloud inpainting as a regularization term, as discussed in Section VII.

Iii-C Proof of Frequency Interpretation in Graph Nodal Domain

While it is known that a larger eigenvalue of the graph Laplacian corresponds to an eigenvector with higher frequency in general, there is no rigorous mathematical proof yet to the best of our knowledge. Hence, we prove it from the perspective of nodal domains [44] and have the following theorem:

Theorem    Regarding the graph Laplacian, there is a positive correlation between eigenvalues and the frequencies of eigenvectors, by defining frequency as the number of zero crossings in graph nodal domains in particular.

Proof    A positive (negative) strong nodal domain of the eigenvector corresponding to on is a maximally connected subgraph such that for all vertices in the subgraph. A positive (negative) weak nodal domain of on is a maximally connected subgraph such that for all vertices in the subgraph, with for at least one vertex in the subgraph. As we know, the number of zero crossings in the signal can represent its frequency. Similarly, the number of zero crossings in nodal domains of the eigenvectors is able to interpret the frequency, as shown in Fig. 3.

Fig. 3: Two 1D examples (row 1 and row 2) of the relationship between the number of zero crossings in graph nodal domains of the eigenvectors (a) (c) and their corresponding frequencies (b) (d), in which the horizontal axis is the index in the eigenvector and the vertical axis is the eigenvector value on each index. (a) has 1 zero crossing while (c) has 2, thus (a) corresponds to lower frequency than (c), as demonstrated in (b) and (d).

In order to prove the positive correlation between the eigenvalue and the frequency of the corresponding eigenvector , , we introduce the upper and lower bounds of the number of nodal domains of , :

  • Upper bound: ( is the multiplicity of ) for strong nodal domains and for weak nodal domains [44].

  • Lower bound: ( is the minimal number of edges that need to be removed from in order to turn it into a forest, is the number of zero in ) for strong nodal domains [45].

Therefore, both the upper and lower bounds of are monotonic in . This means that is established in general, and thus , because is monotonic in the number of zero crossings in nodal domains and is arranged monotonically with respect to .

Iv Overview of the Proposed Inpainting Framework

Fig. 4: The framework of the proposed point cloud inpainting method.

We now introduce the proposed point cloud inpainting method based on the spectral graph theory in Section III. As shown in Fig. 4, the method consists of the following steps:

  • Firstly, we voxelize the input point clouds before inpainting to deal with the irregularity of point clouds.

  • Secondly, we detect the holes in the voxelized point cloud to inpaint in the next step.

  • Thirdly, we inpaint the missing area. (a) We split it into cubes of fixed size as units to be processed in the subsequent steps. (b) We choose the cube with missing data as the target cube. (c) We search for the most similar cube to the target cube in the point cloud, which is referred to as the source cube, based on the DC and AGTV in the normals of points. (d) The inpainting problem is formulated into an optimization problem, which leverages the source cube and the graph-signal smoothness prior. The closed-form solution of the optimization problem leads to the resulting cube.

  • Finally, we replace the target cube with the resulting cube as the output.

V Voxelization

In order to address the irregularity of point clouds, we firstly voxelize the point clouds before inpainting, which means we perturb each point in the point cloud to regular grids, as shown in Fig. 5. The voxelization step has the following advantages:

  • Helps delineate missing areas accurately.

  • Facilitates the determination of the exact locations of points to be filled in.

  • Reduces the computational complexity caused by the irregularity of the point cloud.

Fig. 5: Demonstration of the proposed voxelization for point clouds. (a) The original point cloud. (b) The voxelized result.

The proposed voxelization consists of two steps. Firstly, in order to distribute the point uniformly in the voxels, we preprocess the coordinates to make the distance between each two neighboring points close to and translate the point cloud to the origin. Then we voxelize them and compute the attribute information for each voxelized point.

V-a Coordinate Preprocessing

For the convenience of the subsequent processing and data storage, we convert the coordinates into integers. We first expand or contract the original coordinates to make the distance between each two neighboring points close to . The input data is a point cloud denoted by ( is the number of points in the original point cloud) as the location of the points and as the attribute information of the points. We then construct a -NN graph mentioned in Section III-A over the original point cloud, and compute the average of the distance between connected vertices as . The coordinates are then converted to :


Next, we translate

to the origin of the coordinate axis for simplicity. The translation vector

is computed as , where represents the coordinates in . and are defined in the same way. Then we obtain the -translated location as


where is a translation matrix with repeated in each row.

V-B Voxel Computation

We split into voxels with side length equal to . Thus a voxel consists of a set of points with meaning the coordinates of the -th point in the voxel, each of which corresponds to the attribute information denoted as . In each voxel, we replace the set of points with the centering point of the voxel as long as there are points within, thus obtaining the voxelized point cloud . Then we compute the attribute value for each voxel with the centering location as


where is the weight for the attribute of each point based on the geometric distance to the centering point :


where is a weighting parameter (empirically in our experiments). This is based on the assumption that geometrically closer points have more similar attributes in general.

The voxelized point cloud, with as its locations and as its attributes (normal at each point in our case), is the final input data for the subsequent processing.

Vi Automatic Hole Detection

We propose hole detection mainly based on BFS [46]. As shown in Fig. 6, we first project the 3D point cloud to a 2D depth map along a projection direction denoted by . While can be all the six projection directions along

axes, we deploy Principal Component Analysis (PCA) 

[47] on the normals to choose the best projection direction as in order to save the computational complexity. PCA statistically transforms a set of variables that may be related to each other into a set of linearly uncorrelated variables by orthogonal transformation. The converted set of variables is the principal component, which is the projection direction in our method. Then we search for holes using BFS, and differentiate holes by marking points in each hole with different numbers. Finally, we determine the length of each detected hole along by calculating the coordinate range of each hole along , which gives the hole region in the 3D point cloud. We elaborate on the BFS-based detection method as follows.

Fig. 6: The projection step for point clouds for automatic hole detection. (a) The original point cloud. (b) The depth map obtained from the principle projection direction.

There are two kinds of pixels in the obtained depth map: pixels corresponding to points in the point cloud and pixels with no correspondences (i.e., corresponding to holes). Hence, we propose to detect a hole in the depth map by searching a connected set of pixels with no correspondences, where two pixels are considered connected if they have similar values and locate in one-hop neighboring locations in horizontal, vertical or diagonal directions. We solve this problem by BFS as shown in Fig. 7. We first mark pixels with correspondence as and the other pixels as . Then we traverse pixels in the depth map until we find the first -marked pixel, and change the mark to for the first hole (similarly, for the second hole, for the third hole, etc.) so as to distinguish different holes. Next, we search for the connected -marked pixels among its one-hop neighbors, and update their marks to . We repeat this operation until there are no -marked neighbors of the -marked pixels. Thus the first hole is detected. Similarly, we continue to detect the second hole by searching the next -marked pixel in the depth map, and change its mark to . All the -marked pixels in the neighborhood are then updated to . The same goes for the rest of the holes.

Now we detect all the holes based on BFS. However, some of them only have several pixels in the depth map, which might be detected due to the low density of the point cloud. Therefore, we filter out the holes with the number of missing pixels less than a density-dependent threshold (which is empirically set as 4 because the distance between two adjacent points after voxelization is less than 2 generally). Besides, some detected holes are actually the pixels corresponding to the outside of the object. These holes are distinguished from actual holes by checking if they include any point at the boundary of the depth map and then filtered out. The rest of holes will be inpainted.

Fig. 7: The marking step for hole detection using BFS-based method. (a) The initial marking situation around the hole at the beginning of the detection, where the first -marked pixel is marked in red. (b) The final marking situation around the hole at the end of the detection, where the first -marked pixel is marked in red, the second in pink, the third in blue, the fourth in green and the fifth in yellow.

Vii The Proposed Inpainting Method

We now introduce the proposed point cloud inpainting method based on the spectral graph theory in Section III. As shown in Fig. 4, the input data is a point cloud denoted by with meaning the coordinates of the -th point in the point cloud. Firstly, we split it into fixed-sized cubes as units to be processed in the subsequent steps. Secondly, we choose the target cube with missing area detected in Section VI. Thirdly, we search for the most similar cube to the target cube in , which is referred to as the source cube, based on the DC and AGTV in the normals of points. Fourthly, the inpainting problem is formulated into an optimization problem, which leverages the source cube and the graph-signal smoothness prior. The closed-form solution of the optimization problem gives the resulting cube. Finally, we replace the target cube with the resulting cube as the output.

Vii-a Preprocessing

We first split the input point cloud into overlapping cubes with ( is the size of the cube), as the processing unit of the proposed inpainting algorithm. is empirically set according to the coordinate range of ( in our experiments), while the overlapping step is empirically set as . This is a trade-off between the computational complexity and ensuring enough geometry information available to search for the most similar cube.

Having obtained the cubes, we choose the cube with missing data as the target cube . Further, in order to save the computation complexity and increase the accuracy of the subsequent cube matching, we choose candidate cubes by filtering out cubes with the number of points less than 80% of that of , and augment the candidates by mirroring these cubes with respect to the x-y plane, which will be used in the next step.

Vii-B Non-Local Similar Cube Searching

In order to search for the most similar cube to the target, we first define the geometric similarity metric between the target cube and each candidate cube as