A point cloud is a set of points in 3D space which can have associated attributes such as color or normals. Point clouds are essential for numerous applications ranging from archeology and architecture to virtual and mixed reality. Since they can contain millions of points with complex attributes, efficient point cloud compression (PCC) is essential to make these applications feasible in practice.
When compressing a point cloud, we usually consider two aspects: the geometry, that is the 3D coordinates of each individual point; and the attributes, for example RGB colors. Also, we can differentiate dynamic point clouds, which change in the temporal dimension, from static point clouds. The Moving Picture Experts Group (MPEG) is leading PCC standardization efforts [schwarz_emerging_2018]. Specifically, two main solutions have emerged: the first one, Geometry PCC (GPCC), uses native 3D data structures; while the second one, Video-based PCC (VPCC), targets mainly dynamic point clouds, and projects the data on a 2D plane to make use of available video codecs such as HEVC.
Point clouds can be interpreted as a 2D discrete manifold in 3D space. Thus, instead of compressing point cloud attributes using 3D structures such as octrees, we could fold this 2D manifold onto an image. This opens many avenues of research, as it provides, e.g., a way to apply existing image processing techniques straightforwardly on point cloud attributes. In this work, we propose a novel system for folding a point cloud and mapping its attributes to a 2D grid. Furthermore, we demonstrate that the proposed approach can be used to compress static point cloud attributes efficiently.
2 Related Work
Our work is at the crossroads of static point cloud attribute compression and deep representation learning of 3D data. Compressing static point cloud attributes has been explored using graph transforms [zhang_point_2014], the Region-Adaptive Hierarchical Transform (RAHT) [queiroz_compression_2016] and volumetric functions [krivokuca_volumetric_2018]
. Graph transforms take advantage of the Graph Fourier Transform (GFT) and the neighborhood structure present in the 3D space to compress point cloud attributes. The RAHT provides a hierarchical transform which extends the Haar wavelet transform to an octree representation. In this paper, we propose a different perspective, and leverage the manifold interpretation of the point cloud by flattening its attributes onto a 2D grid, which can then be compressed as an image.
Deep learning methods have been used for representation learning and compression of point clouds [quach_learning_2019]. In particular, the initial folding in our work is inspired by [yang_foldingnet:_2017]
where an autoencoder network is trained on a dataset to learn how to fold a 2D grid onto a 3D point cloud. In our work, we build on this folding idea; however, we employ it in a very different way. Specifically, we do not aim at learning a good representation that can generalize over a dataset; instead, we employ the folding network as a parametric function that maps an input 2D grid to points in 3D space. The parameters of this function (i.e., the weights of the network) are obtained by overfitting the network to a specific point cloud. In addition, the original folding proposed in[yang_foldingnet:_2017] is highly inefficient for PCC, as it poorly adapts to complex geometries and alters the local density distribution of 3D points. In our work, we also propose a number of solutions to improve folding.
3 Proposed method
We propose a novel system for compressing point cloud attributes centered around the idea that a point cloud can be seen as a 2D discrete manifold in 3D space. Thus, we can obtain a 2D parameterization of the point cloud and we can map attributes from a point cloud onto a grid, making it possible to employ 2D image processing algorithms and compression tools. In a nutshell, our approach is based on the following two steps: a) we find a parametric function (specifically, a deep neural network) to fold a 2D grid onto a 3D point cloud; b) we assign attributes (e.g., colors) of the original point cloud to this grid. The grid and the parametric function contain all the necessary information to recover the point cloud attributes. Assuming the point cloud geometry is coded separately and transmitted to the decoder, the folding function can be constructed at the decoder side, and the 2D grid is fully decodable without any need to transmit network parameters. In practice, the 3D-to-2D mapping is lossy, which entails a distortion in the step b) above. In the following, we propose several strategies to reduce this distortion. The overall system is depicted in Figure 1.
Given a nonnegative matrix , we define its row-wise normalization as
Given , we also define a normalization as follows .
Given the 3D coordinates of points in two point clouds and , we define the adjacency matrix
where if and only if is a neighbor of . We also define the shorthand where the if and only if is the nearest neighbor of .
3.2 Grid folding
We propose a grid folding in two steps, namely, an initial folding step to get a rough reconstruction of and a folding refinement step to improve the reconstruction quality, which is quintessential to map point cloud attributes with minimal distortion.
3.2.1 Initial folding
We fold a grid onto a point cloud to obtain its 2D parameterization by solving the following optimization problem:
where is a set of points in the considered point cloud, a set of points of 2D grid with 3D coordinates, is a set of points obtained by folding ,
is a loss function andis a parameterized folding function.
We parameterize using a neural network composed of an encoder and a decoder such that and . is composed of four pointwise convolutions with filter sizes of followed by a maxpooling layer. is composed of two folding layers with . Each folding layer has two pointwise convolutions with filter sizes of and concatenates to its input. The last pointwise convolution has a filter size of
. We use ReLU for the encoder and LeakyReLU for the decoder.
We propose the following loss function
where is the Chamfer distance:
and is a newly defined repulsion loss:
The Chamfer distance ensures that the reconstruction is similar to and the repulsion loss penalizes variations in the reconstruction’s density.
We obtain by training a neural network using the Adam optimizer [kingma_adam:_2014] using as an input which is equivalent to overfitting the network on a single sample .
3.2.2 Folding refinement
The initial folding has issues reconstructing complex shapes accurately (Figure 1(b)). Specifically, the two main issues are the density mismatch between and and imprecise reconstruction for complex shapes as evidenced by Figure 2(a). As a result, this heavily penalizes the mapping of attributes from the original PC to the folded one, introducing distortion in the reconstructed point cloud attributes. The folding refinement reduces variations in local density while conserving the local neighborhood structure on the grid. In other terms, points that are close by in 3D space are mapped to neighboring points in the 2D grid which is important to preserve spatial correlations on the 2D grid.
First, we reduce local density variations by defining a linear operator
. To this end, we set the position of each point to an average of its neighbors’ positions weighted by an inverse density estimate. Letbe the grid neighborhood adjacency matrix with is the grid neighborhood with horizontal and vertical connections. We build
, an inverse density vector withwhere is higher if its neighbors are far (low density) and lower if its neighbors are near (high density). We then adjust using the inverse density giving us .
Second, we refine the reconstruction using a matrix by targeting two issues: insufficient coverage, when does not cover parts of , and imprecision, when fails to reproduce the complexity of . To solve these issues, we propose the following formulation where the first and second term respectively target these two main issues.
Finally, we combine these components into a single refinement system to update the point cloud reconstruction:
where is an inertia factor and . The first term preserves the grid structure of and encourages uniform density by applying a penalty to points with high density. The second term effectively creates a bipartite connection scheme between and and acts in two ways: first, points in attract their nearest neighbors in and second, points in are attracted to their nearest neighbors in .
3.3 Attribute Mapping
Once a sufficiently accurate 3D point cloud geometry is reconstructed (Figure 1(c)), we can map attributes from to which has a 1 to 1 mapping with the 2D grid. However, the mapping from to is not one-to-one which introduces distortion. Thus, we propose the following attribute mapping procedure and we improve it using occupancy optimization.
In the mapping module, we proceed to transfer attributes from onto . We build a mapping vector where is the point associated to . Then, we recover the attributes vector with the set of indices in associated to a given index in . To reverse the mapping, we map the attributes in to each point .
We build the mapping using the following adaptive procedure. For each point in , we select the point belonging to the nearest neighbors that minimizes the number of points already assigned to it multiplied by its distance to . This makes the mapping -to- from to . It is important to note that when multiple points in are mapped to a point in , the attributes are averaged and the mapping introduces distortion. This mapping distortion is mitigated by the adaptive mapping procedure at the cost of reducing spatial correlations in the mapped image. Also, when a point in has no associated point in , its attributes are undefined which for color results in black pixels. We fill unoccupied pixels by assigning them the color of their nearest neighbor in to avoid increasing image bitrate.
3.3.1 Occupancy optimization
We observe that overdensities of points in the original geometry w.r.t. the unfolded one cause distortion due to the -to- attribute mapping. We mitigate this problem by adding “free slots” in the 2D grid (see Fig. 1(d)), using the following procedure. Using the previous definition of the mapping vector , we obtain the occupancy vector such that . In particular, we can reshape and as matrices and using their grid structure. We compute row-wise and column-wise mean occupancies (zeros excluded) and we select the row/column with the maximum mean occupancy. Then, we reduce its occupancy by inserting additional rows/columns around it. We repeat this procedure until the maximum mean occupancy is equal to (lossless mapping) or the relative change in mean occupancy is superior to a given threshold .
4 Experimental results
We evaluate our system for static point cloud attribute compression and compare it against GPCC v3 [mammou_pcc_2018] and v7 [noauthor_g-pcc_2019]
. We also study the impact of folding refinement and occupancy optimization on our method by presenting an ablation study. Since folding tends to be less precise with large point clouds with complex geometries, we manually segment the point clouds into patches and apply our scheme on each patch. The patches are then reassembled in order to compute rate-distortion measures. In the initial folding, we use TensorFlow 1.15.0[abadi_tensorflow_2016], a learning rate of , , and we enable early stopping. For the folding refinement, we set to and perform 100 iterations. When mapping attributes, we set the number of neighbors considered for assignment to 9. When optimizing occupancy, we set to . We then perform image compression using BPG [bellard_bpg_nodate], an image format based on HEVC intra [noauthor_high_nodate], with QPs ranging from 20 to 50 with a step of 5.
In Figure 3, we observe that our method performs comparably to GPCC for “longdress" and “redandblack" but is slightly worse for “soldier". This is because the latter’s geometry is much more complex making a good reconstruction difficult and introducing mapping distortion. We obtain huge gains in terms of rate-distortion by improving the reconstruction quality using folding refinement and occupancy optimization. This shows the potential of our method and confirms the importance of reducing the folding distortion.
Based on the interpretation of a point cloud as a 2D manifold living in a 3D space, we propose to fold a 2D grid onto it in order to compress attributes leveraging conventional image codecs. As the mapping is intrinsically lossy, this calls for strategies to assign point colors optimally to the pixels of the grid. We have done this heuristically by equalizing the local density of the points in the reconstructed point cloud, obtaining encouraging coding results. In the future, we plan to make this process more rigorous, by optimizing the whole pipeline in an end-to-end fashion. Also, integrating this tool in existing PCC schemes is another promising research avenue.
This work was funded by the ANR ReVeRy national fund (REVERY ANR-17-CE23-0020).