Predictive Generalized Graph Fourier Transform for Attribute Compression of Dynamic Point Clouds

08/06/2019 ∙ by Yiqun Xu, et al. ∙ City University of Hong Kong Institute of Computing Technology, Chinese Academy of Sciences Peking University 0

As 3D scanning devices and depth sensors advance, dynamic point clouds have attracted increasing attention as a format for 3D objects in motion, with applications in various fields such as tele-presence, navigation for autonomous driving and heritage reconstruction. Nevertheless, the tremendous amount of data in dynamic point clouds significantly burden transmission and storage. We thus propose a complete compression framework for attributes of 3D dynamic point clouds, focusing on optimal inter-coding. Firstly, we derive the optimal inter-prediction and predictive transform coding assuming the Gaussian Markov Random Field model for attributes of dynamic point clouds, where the optimal predictive transform proves to be the Generalized Graph Fourier Transform (GGFT). Secondly, we propose refined motion estimation via efficient registration prior to inter-prediction, which searches the temporal correspondence between adjacent frames of point clouds. Finally, we construct a complete framework based on the optimal inter-coding and our previously proposed intra-coding, where we determine the optimal coding mode from rate-distortion optimization with the proposed offline-trained λ-Q model. Experimental results show that we achieve 13.4 average and up to 25.8 Transform method.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 5

page 6

page 7

page 10

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

With the development of depth sensing and 3D graphic technology, 3D dynamic point clouds have attracted intensive attention in the representation for real scenes or 3D objects in motion. A dynamic point cloud consists of a sequence of static point clouds, each of which is composed of a set of points, with 3D coordinates to represent the geometric information as shown in Fig. 1. Attribute information is often associated with each point to enrich its functions, such as colors and normals. Due to the efficient representation, 3D dynamic point clouds have been widely applied in various fields, such as 3D immersive tele-presence, navigation for autonomous vehicles, gaming, and animation [47].

Fig. 1: Two frames in the dynamic point cloud sequence Longdress [13], as well as the merged one to demonstrate the geometry difference between neighboring frames. In (c), frame 1051 is marked in purple while frame 1052 in green.

Nevertheless, the large amounts of data in 3D dynamic point clouds significantly increase the burden for transmission and storage, especially with multiple attributes on each point. Further, unlike images or videos, 3D points are acquired by directly sampling the surfaces of real objects, which usually leads to irregular point distribution. Also, the number of points in each frame mostly varies over time. Hence, it is quite challenging to represent dynamic point clouds compactly. In the face of such necessity and challenges of point cloud coding, the 3D Graphic Group (3DG) of Moving Picture Experts Group (MPEG) has started the standardization of point cloud compression (PCC)111https://mpeg.chiariglione.org/standards/mpeg-i/point-cloud-compression.

While many approaches have been proposed to support efficient compression of static point clouds, including geometry coding [32, 39, 38] and attribute coding [53, 50, 9], few efforts [45, 30, 2, 10] are made for dynamic point clouds. Also, existing dynamic point cloud coding methods focused more on the coding of geometry [45, 33, 10], but less on specialized inter-coding tools for attributes [45, 10]. In practice, attributes are critical in providing information for specific applications, e.g., colors and normals for visualization, as well as in rendering point clouds with high quality. Hence, we focus on the attribute coding of dynamic point clouds in this paper.

Recent works on the intra

-coding of point cloud attributes can be classified into two categories: 1) Codec-based point cloud compression

[31, 30], which reorders point clouds into regular samples and then deploys existing image/video coding tools (e.g., JPEG [49], HEVC [43]). However, this class of methods rely on mature image/video codecs regardless of intrinsic characteristics of 3D point clouds. 2) Geometry-based point cloud compression [24, 9, 53, 50], which designs geometry-based transforms that are tailored for irregular point clouds. Among them, Region-adaptive Hierarchical Transform (RAHT) [9] is the state-of-the-art intra-frame compression method for point cloud attributes, which devises a hierarchical sub-band transform that resembles an adaptive variation of a Haar wavelet and arithmetic coding.

In order to further compress a dynamic point cloud, inter-coding is required to reduce the temporal redundancy. Similar to the intra-coding, there exist two major categories of exploiting the temporal correlation in point clouds: 1) Codec-based methods [30], where the inter-coding of dynamic point clouds converts to that in HEVC. However, certain features of point clouds may lose during the process of projecting dynamic point clouds to videos. 2) Geometry-based methods [31, 45], which search the temporal correspondence between neighboring frames of point clouds based on the geometry information from Iterative Closest Point (ICP) [6, 3, 45] or feature matching[31]. Nevertheless, either complicated regions of point clouds are difficult to register via ICP or the temporal correlation is not fully utilized for encoding the inter-coding residual, leading to sub-optimal inter-coding.

In order to fully exploit intrinsic temporal correlations for compact representation, we propose optimal inter-prediction and predictive transform coding with refined motion estimation for attributes of dynamic point clouds. Firstly, assuming the Gaussian Markov Random Fields (GMRF) model [37] and a spatio-temporal graph representation for dynamic point clouds, we derive optimal inter-prediction and predictive transform coding for the prediction residual, which depends on the precision matrix in the GMRF model similar to [54]. However, it is often complicated to estimate the precision matrix statistically. Instead, we interpret the precision matrix by the generalized graph Laplacian matrix222In spectral graph theory, the graph Laplacian is an algebraic representation of the connectivity and degree of a graph. There are different variants of Laplacian matrices, and the generalized graph Laplacian is one of them. The formal definition is in Sec. III. in spectral graph theory [8, 42], which is efficient to compute. The generalized graph Laplacian essentially encodes not only the spatial correlation within each frame, but also the temporal dependency across neighboring frames as the boundary condition of the current frame. This leads to a generalized variant of Graph Fourier Transform (GFT), referred to as the generalized GFT (GGFT) [22] as the optimal predictive transform. The GGFT is an adaptive transform computed from a graph underlying the signal, which proves to optimally decorrelate the dynamic point cloud both spatially and temporally under the assumed GMRF model.

Secondly, we address the challenge of searching the temporal correspondence between neighboring frames of irregular point clouds via the proposed refined motion estimation. Specifically, we first segment each frame into clusters based on the spatial correlation of geometry, which serve as the processing unit for efficient representation. For the target cluster in the current frame, in order to address the challenge that collocated clusters exhibit irregular shape and different number of points, we build a bounding cube around and register with the reference frame via ICP. Different from [31], we register a cluster based on a larger temporally collocated cube which provides sufficient feasible searching space of corresponding points. Further, we search the temporal correspondence based on point-to-point Euclidean distance. This yields reference points for the subsequent inter-prediction and transform, which is able to handle complicated regions.

Finally, we design a complete framework of dynamic point cloud attribute compression with an offline-trained -Q model for rate-distortion optimization. Specifically, the framework consists of two coding modes to fully exploit spatial-temporal dependencies, including our previous intra-coding [50] as one mode and the proposed inter-coding as the other, both of which are based on GFTs. We determine the optimal coding mode based on rate-distortion optimization to achieve the best trade-off between coding rates and attribute distortions. Further, we establish a -Q model to derive the Lagrange multiplier off-line for best trade-off, according to the statistics of attributes in dynamic point clouds. Experimental results show that we outperform the state-of-the-art Region-adaptive Hierarchical Transform method [9] by 13.4% in bitrate reduction on average for the luminance component. This validates the effectiveness of the proposed refined motion estimation, optimal inter-prediction and predictive GGFT coding for the spatial and temporal decorrelation of dynamic point clouds.

In summary, the main contributions of our work include:

  • We propose optimal inter-prediction and predictive transform coding assuming the GMRF model for attributes of dynamic point clouds. We derive that the optimal predictive transform is the GGFT, which fully decorrelates the inter-prediction residual.

  • We propose refined motion estimation via efficient registration prior to inter-prediction, which addresses the challenge of searching the temporal correspondence between neighboring frames of irregular point clouds.

  • We present a complete framework of dynamic point cloud attribute compression, consisting of our previous intra-coding and the proposed inter-coding. The optimal coding mode is determined from rate-distortion optimization with the proposed offline-trained -Q model.

The remainder of this paper is organized as follows. Sec. II reviews related works on point cloud compression. Sec. III introduces relevant concepts in spectral graph theory and the generalized GFT. Next, we elaborate on the proposed optimal inter-prediction and predictive transform in Sec. IV. Then we discuss the complete coding framework with refined motion estimation and -Q model in Sec. V. Experimental results and conclusions are presented in Sec. VI and Sec. VII, respectively.

Ii Related Work

Previous point cloud compression focuses more on the coding of geometry, such as the spinning tree [16], the kd-tree structure [11, 34], and the octree [38, 39, 25], which organize unstructured point clouds into regular grids. Among them, the octree approach is widely adopted nowadays, which is a counterpart of quadtree in 2D images. Recently, the coding of attributes is drawing increasing attention. We discuss previous works in intra-coding and inter-coding of attributes in order.

Ii-a Intra-coding of point cloud attributes

The intra-coding of point cloud attributes can be divided into two categories: 1) Codec-based methods, which leverage existing image/video codecs by projecting different viewpoints of point clouds onto depth maps and texture maps. 2) Geometry-based methods, which design transforms based on the geometry and perform transform coding on attributes.

In the first category, Mekuria et al. employ JPEG [49]

to encode color information which is padded into a map via scanning

[31]. Mammou et al. [30] introduce the video-based point cloud compression (V-PCC), encoding the projected color and depth images via HEVC [43], which improves the performance further. However, the scanning or projection process inevitably introduces losses in the structure and details of point clouds. Also, the characteristics of point clouds are not taken into consideration when encoding via existing image/video codecs.

In the second category, Huang et al. [24] explore the color attribute compression and verify that there is considerable redundancy in color representation within octree structure. G-PCC [1] is proposed as an open platform by MPEG-3DG, and Region-adaptive Hierarchical Transform (RAHT) [9] is adopted as a tool of attribute coding. Specifically, Queiroz et al. compress the color information of point clouds via a hierarchical sub-band transform [9] that resembles an adaptive variation of a Haar wavelet[17]. They use an entropy coder to encode RAHT coefficients based on arithmetic coding (AC) and adaptive run-length Golomb-Rice encoding (RLGR) [29] further, which is a much less complex entropy coder. Zhang et al. [53] propose to encode attributes via GFT [18] over each sub-cloud. However, the edge weight allocation for graphs is inefficient since it ignores the characteristics of neighboring points. To address this problem, our previous work proposes a Normal-Weighted GFT (NWGFT) [50]. We first cluster the point cloud according to the point coordinate distribution, which makes each sub-cloud more correlated within. Then we design a novel edge weight allocation method for constructing GFT by exploiting the similarity in normals of points, so that the correlation within each sub-cloud is further removed.

Ii-B Inter-coding of point cloud attributes

Similarly, the inter-coding can be classified into two categories: 1) Codec-based methods [30], which leverage existing video codecs (e.g., HEVC) but may lose details when projecting dynamic point clouds to videos. 2) Geometry-based methods, which exploit the temporal correlation by registering or matching neighboring point clouds based on geometry information. Mekuria et al. [31]

propose to register neighboring frames of point clouds via ICP, and encode the transformation matrix and other support information to reduce the bit rate. However, complicated regions of point clouds are difficult to register via ICP. Moreover, they deploy ICP to search for corresponding 3D blocks if two adjacent frames satisfy certain conditions such as deviation in color variance. Otherwise, they resort to intra-coding, thus limiting the use of ICP. In

[45], Thanou et al. represent a dynamic point cloud by a set of graphs and match features over them to perform motion estimation and predictive coding. However, the feature matching step is computationally expensive. Moreover, the inter-coding residual is encoded based on the spatial correlation within each frame instead of the temporal correlation, leading to sub-optimal inter-coding.

Iii Background in Graph Fourier Transform

Iii-a Graph, Graph Laplacian and Graph Fourier Transforms

We consider an undirected graph composed of a vertex set of cardinality , an edge set connecting vertices, and a weighted adjacency matrix . is a real symmetric matrix, where is the weight assigned to the edge connecting vertices and . We assume non-negative weights, i.e., .

The graph Laplacian matrix is then defined from the adjacency matrix. Among different variants of Laplacian matrices, the combinatorial graph Laplacian used in [40, 20, 21] is defined as , where is the degree matrix—a diagonal matrix with .

The combinatorial graph Laplacian is real and symmetric, which means it admits a complete set of orthonormal eigenvectors. The GFT basis

is then the eigenvector set of the Laplacian matrix. For a given signal defined on the graph (i.e., graph signal), the formal definition of its GFT is

(1)

The inverse GFT follows as

(2)

GFT is a content-adaptive linear transform and has been shown to be superior in compressing certain types of signals, e.g. mesh geometry

[26], depth maps [41][7], and images/videos [54, 23].

Iii-B The Generalized Graph Laplacian

A generalized Laplacian (or discrete Schrodinger operator) is a symmetric matrix with non-positive off-diagonal entries [4]. Defining a diagonal matrix , one can rewrite a generalized Laplacian as

(3)

where the diagonal entries of can be viewed as a potential defined on the vertices, e.g., a boundary condition [22]. It can also be written as

(4)

where is the degree matrix of the generalized Laplacian:

(5)

Iii-C The Generalized Graph Fourier Transforms

The GGFT is firstly proposed in [22] as an image transform, which is optimized for intra-prediction residual in images. Following the definition of GFT, the basis of the GGFT is the eigenvector set of the generalized graph Laplacian . The GGFT is then defined as

(6)

The inverse GGFT follows as

(7)

In the generalized graph Laplacian in [22], the diagonal entries correspond to vertices at image block boundaries, with an extra weight added as a function of the expected inaccuracy of intra-prediction. It is analyzed in [22] that GGFT can be viewed as extensions of widely used transforms (namely, the discrete cosine transform, DCT, and the asymmetric discrete sine transform, ADST [19]).

Iv Optimal Inter-prediction and Transform

In this section, we derive the optimal inter-prediction and transform under GMRF modeling of a dynamic point cloud sequence. We first propose spatio-temporal graph construction, which serves as the underlying graph for GMRF. The optimal inter-prediction and transform is then deduced from conditional dependencies encoded in the precision matrix in GMRF. Due to the estimation limitation of the precision matrix in practice, we interpret the precision matrix by the generalized graph Laplacian, which leads to the final optimal inter-prediction and predictive GGFT.

Iv-a Proposed Spatio-temporal Graph Construction

Given an input point cloud sequence , where is the number of frames in , we consider two neighboring frames and . Assuming availability of geometry at both the encoder and decoder as well as certain correlation between geometry and attributes, we partition the target frame into clusters based on geometry in order to efficiently exploit the temporal correlation. Then we perform motion estimation to acquire point-to-point correspondence between each cluster in and points in based on geometry. The details of geometry-based clustering and motion estimation will be discussed in Sec. V-B, while we focus on the graph construction over each cluster here.

Considering the target cluster in , where is the cardinality of points in , we assume the corresponding points in forms a set . We construct a spatio-temporal graph over and to encode spatio-temporal correlations, which prepares the ground for the subsequent inter-prediction and predictive transform. Specifically, we treat each point as a vertex in the graph, and build connections including spatial connectivities within and temporal connectivities between corresponding points of and . We discuss the spatial and temporal connectivities in order.

Iv-A1 Spatial graph connectivities

We construct spatial connectivities in based on global and local features as in our previous work [50]. We employ the Euclidean distance between 3D coordinates as the global feature to decide

-nearest-neighbors of each point, which constitute a local surface. Then we estimate the normal vector of each surface as the local feature to compute edge weights in a Gaussian kernel. More details will be described in Sec. 

V-C.

Iv-A2 Temporal graph connectivities

We build temporal connections based on point-to-point correspondence acquired from refined motion estimation. Specifically, we connect each point in the target cluster of to its corresponding point in of . While it is possible to set weights of temporal edges as a Gaussian kernel of the distance between 3D coordinates of corresponding points, we assign all the temporal edge weights as for simplicity.

Iv-B Optimal Prediction and Transform under GMRF

We propose optimal prediction and transform based on the constructed spatio-temporal graph. In particular, we first derive optimal prediction and transform statistically under the GMRF modeling of point clouds.

Iv-B1 Preliminaries in Gaussian Markov Random Field

We model the spatio-temporal correlation of dynamic point clouds on GMRFs. The formal definition of a GMRF is as follows.

Definition A random vector is called a GMRF with respect to the graph with mean and a precision matrix (positive definite), if and only if its density has the form

(8)

and

(9)

This definition infers that a signal

modeled by a GMRF follows a multivariate Gaussian distribution, with the mean vector

and the covariance matrix —the inverse of . The precision matrix is deployed in the definition of GMRF for its conditional interpretations:

(10)
(11)

where denotes all elements in except , and represents all nodes that are neighbors of in the graph. (10) and (11) interpret the conditional expectation and precision of given all other elements based on parameters of the GMRF, which will be leveraged to derive an optimal inter-prediction and predictive transform.

Fig. 2: The proposed coding framework for attributes of 3D dynamic point clouds.

Iv-B2 Optimal prediction and transform

We assume the attributes of and follow the GMRF model, with mean and , and the precision matrix and respectively.

Given the reference set , the inter-prediction problem is essentially predicting from . As discussed in [54], the optimal inter-prediction is the conditional expectation of given under the GMRF model. Any other predictor will yield non-zero expected prediction error. For the resulting prediction residuals, the optimal predictive transform basis is the eigenvector set of the precision matrix , i.e., the Karhunen-Loeve Transform (KLT) [48], to optimally decorrelate under GMRF.

In particular, we group and into , with mean and precision matrix . Then we partition the precision matrix as

(12)

where , and .

As mentioned in [37], is also a GMRF with mean and precision matrix , where

(13)
(14)

Next, we discuss the specific form of and based on our graph construction. According to (9), represents the temporal connectivity between and . Since we assign the edge weight between each pair of temporally corresponding points in and as , we have

(15)

where

is an identity matrix, and

(16)

Substituting (15) and (16) into (13), we have the optimal prediction as

(17)

Further, assuming zero-mean for both and , i.e., , we have

(18)

Substituting (16) into (14), the optimal predictive transform basis is the eigenvector set of

(19)

Iv-C Proposed Inter-Prediction and Predictive GGFT

The derived optimal prediction and transform in (18) and (19) under GMRF depends on the precision matrix, which is however often difficult to estimate given a single observation of a dynamic point cloud sequence. Instead, we interpret the precision matrix by the generalized graph Laplacian matrix, and thus deploy the generalized graph Laplacian for the optimal inter-prediction and predictive transform.

As discussed in [35, 12, 52], the precision matrix in general GMRF can be interpreted by a generalized graph Laplacian , i.e.,

(20)

Combining (18), (19) and (20), we derive the final optimal inter-prediction as

(21)

where denotes the generalized graph Laplacian for . As the graph Laplacian is a high-pass filter [42], corresponds to a low-pass filter. Hence, (21) indicates that the prediction of is a low-pass filtered version of .

Accordingly, the optimal predictive transform for the resulting prediction residual is the GGFT computed from

(22)

By incorporating temporal dependencies between adjacent frames, GGFT is optimal in terms of full spatio-temporal decorrelation.

Further, as includes both spatial connectivities within as well as temporal connectivities between and as presented in (16), we decompose it into the following for simpler computation:

(23)

where encodes the spatial connectivities of and encodes the temporal connectivities to . Here corresponds to in (3), which can be viewed as the boundary condition of each cluster in the current frame.

V Proposed Coding Framework

Fig. 3: The proposed inter encoder for attributes of dynamic point clouds.

Fig. 4: Illustration of the proposed refined motion estimation. The green patch is the target cluster and the yellow one is the reference set.

Based on the derived optimal inter-prediction and predictive transform, we present a complete coding framework for attributes of 3D dynamic point clouds, including the proposed inter-coding and our previously proposed intra-coding [50] as shown in Fig. 2. Further, we design coding mode decision between the inter-mode and intra-mode for Rate-Distortion Optimization. Note that, we assume the geometry information of point clouds are losslessly coded and available at both the encoder and decoder. We discuss the procedures of inter-coding and intra-coding respectively as follows, as well as the designed coding mode decision.

V-a Preprocessing: Voxelization

Different point clouds have various scales of size and precision, which inevitably affects the inter/intra-coding process as we construct graphs based on local and global features of geometry. For the sake of generality, we preprocess the input point cloud via voxelization, which maps all the points into bins of dimension and thus leads to point clouds with unified scale in coordinates. In our experiments,

is set to 4096. Specifically, a bin or a voxel is regarded occupied, if it contains at least one point, otherwise it is unoccupied. Then the geometry information of voxels is represented by a set of triples. The attribute (e.g., color) of each voxel is calculated as the average attribute value of all the points within the voxel. At the decoder, we perform devoxelization on the decoded point clouds so as to comply with the objective evaluation metric of MPEG and keep the geometric scale of point clouds unchanged.

Fig. 5: The proposed intra encoder for attributes of dynamic point clouds.

V-B Optimized Inter-Coding

As shown in Fig. 3, the optimized inter-coding consists of four steps: 1) segment each frame of point cloud into clusters as the processing unit based on geometry; 2) search the temporal correspondence between each cluster in the current frame and a set of points in the previous frame, i.e., refined motion estimation; 3) construct a spatio-temporal graph over each cluster in the current frame and compute its generalized Laplacian matrix; 4) perform optimal inter-prediction and predictive transform. We discuss the four steps as follows, with emphasis on the proposed refined motion estimation.

V-B1 Geometry clustering

To mitigate the computation cost for eigen-decomposition of the Laplacian matrix, we partition the input point cloud into small clusters. Instead of uniform spatial partition of point clouds[53] which would create many isolated sub-clouds if the point cloud is sparse, we employ -means clustering [14] based on geometry. The point cloud is divided into clusters and each cluster contains points. To balance the coding performance and computational complexity, we set the average number of points in each cluster to .

V-B2 Refined Motion Estimation

In order to efficiently exploit the temporal correlation between two neighboring frames and , we propose to register the target cluster in with the previous frame via ICP and then find point-to-point correspondence in registered sets.

As demonstrated in Fig. 4, to reduce the registration complexity, we first form a bounding box around and expand it with a certain percentage ( in our experiments) to a bounding box . We then set a bounding box in , which is collocated with and will be employed to find reference points in . This serves as a refinement step for motion estimation.

Further, we acquire point-to-point correspondence between the target cluster in and the reference bounding box in . Specifically, the correspondence of each point in is its nearest point in in terms of Euclidean distance. The resulting forms the corresponding set in , denoted as . As such, we acquire the temporal correspondence between the reference frame and the current frame .

Note that, while we accommodate neighboring frames and with different number of points, the corresponding sets and contain the same number of points by the proposed refined motion estimation.

V-B3 Graph Construction

We construct a spatio-temporal graph over each cluster based on geometry as described in Sec. IV-A, in order to acquire in (21) and (22) for the subsequent inter-prediction and predictive transform.

V-B4 Inter-Prediction and Predictive Transform

Following graph construction, we compute the generalized Laplacian accordingly as in (23). Then we calculate the optimal prediction via (21), acquire the residual signal and compute the optimal predictive GGFT for the residual via (22). Note that, we may employ the fast GFT algorithm in [27] to accelerate the computation of GGFT. The resulting transform coefficients are quantized, entropy encoded and transmitted to the decoder.

V-C Intra-Coding with Normal-Weighted GFT

As illustrated in Fig. 5, we adopt our previous algorithm in [50] for intra-coding. The key idea is to capture structural similarities by a Gaussian kernel of normals as edge weights, from which the GFT is computed for compact representation. Specifically, we first perform geometry clustering as in the optimized inter-coding, and then calculate Normal-Weighted Graph Fourier Transform (NWGFT) for each cluster.

In NWGFT, we propose a novel edge weight allocation method for graph construction by exploiting the similarity of normal vectors in relative local space to further remove the correlation of each cluster. Specifically, we first build a -neighborhood graph, which connects all points whose pairwise distances are smaller than . The choice of is dependent on the density of the point cloud for capturing the local structure, which we will detail in the experimental setup. Next, we compute the normal vector of a -nearest-neighbor local space around each point . The normal is estimated by decomposing the covariance matrix [15], which serves as a local feature. The edge weight between and is then assigned as:

(24)

where is the angle between two normal vectors and , and is a parameter in the Gaussian kernel. In our experiments, and are set to and , respectively. The proposed edge weight function defined in (24) is more robust than commonly used Euclidean distance between coordinates [53], by considering features of each point and its neighborhood via normals.

Based on the above graph construction, we compute a combinatorial graph Laplacian to acquire GFT basis for each cluster. Similar to the inter-coding, we may deploy fast GFT [27] to reduce the computation complexity of GFT. The resulting transform coefficients are quantized, entropy encoded and transmitted to the decoder.

Fig. 6: The relationship between Lagrange multiplier and Qstep for different dynamic point clouds.

V-D Coding Mode Decision

Similar to traditional video coding, given the target bit rate, the goal of the attribute compression of dynamic point clouds is to convey the attribute information with minimum possible distortion. In order to achieve the best rate-distortion (RD) performance, a coding mode should be determined from the intra and inter modes to reach the best balance between rate and distortion. This can be cast into the classical RD framework, which is formulated as pursuing the best quality under the limitation of a given target rate [44, 55],

(25)

where indicates the target bits, and denote the distortion and coding bits with the -th coding mode respectively. This constrained problem can be converted into the unconstrained Lagrangian rate-distortion optimization problem as

(26)

where is the RD cost of mode , and is the Lagrange multiplier which controls the trade-off between rate and distortion. In particular, is determined by the quantization factor , leading to a -Q model. Given , the optimal mode with the lowest RD cost can be identified. We discuss the RD calculation and -Q model in detail below.

V-D1 Rate and Distortion Calculation

We calculate the coding rate of mode in terms of bytes per voxel after encoding the attribute of a frame. The distortion is then obtained using the average of Mean Squared Error (MSE) in the YUV space:

(27)

where , and denote the distortion between the original and reconstructed frames in components Y, U and V, respectively.

V-D2 -Q model

In traditional image and video compression, the Lagrange multiplier as a function of can be off-line trained from the statistics of images or videos. Analogously, in the context of point cloud compression, we learn an appropriate -Q model for efficient RD optimization. By setting the derivative of in (26) with respect to the quality factor to 0,

(28)

we have

(29)

This implies that characterizes the slope of the rate-distortion curve. In our previous work [51], we present the rate-distortion curves of different static point clouds, and show that the rate-distortion relationships of various static point clouds are quite close. Therefore, we off-line derive the -Q model based on statistics of point clouds. By discretizing the rate and distortion points in (29), we approximate the slope of the rate-distortion curve using neighboring rate and distortion points as

(30)

Given different dynamic point clouds, the relationships between and quality factor for these sequences are plotted in Fig. 6. We acquire a best approximation of with a power function of as

(31)

where and .

Based on the RD optimization and -Q model, the best mode for each cluster can be determined.

Vi Experimental Results

Fig. 7: Rate-Distortion curves for the proposed method, RAHT and NWGFT.

Fig. 8: Rendering results for point clouds Ricardo (frame0009), Soldier (frame0547) and Phil (frame0007) compressed under similar rate. (a) NWGFT. (b) RAHT. (c) Proposal. (d) Input.

Vi-a Experimental Setup

To validate the proposed complete framework for attribute compression of dynamic 3D point clouds, we compare with two competitive methods: 1) Region-Adaptive Hierarchical Transform (RAHT) 333https://github.com/digitalivp/RAHT. [9], which has been adopted in geometry-based point cloud compression (G-PCC)444G-PCC is a widely deployed open-source point cloud compression API introduced by 3D Graphics (3DG) group of MPEG. and is the state-of-the-art method for attribute coding of static point clouds [1]; 2) our previous intra-coding method Normal-Weighted Graph Fourier Transform (NWGFT) [50]. Both RAHT and NWGFT are proposed for attribute coding of static point clouds, thus we perform them on each frame of a dynamic point cloud sequence separately for comparison. Also, both methods are geometry-based as our proposal for fair comparison.

We conduct experiments on two sources of datasets, including 4 MPEG sequences (Longdress, Loot, Redandblack and Soldier) from [13] and 5 MSR sequences (Andrew9, David9, Phil9, Ricardo9 and Sarah9) from [28]. For the convenience of experiments, 16 frames are selected from each sequence and the group of picture (GOP) is set to 8. Low-delay P (LDP) configuration is adopted in our experiments. We set the parameter in the -neighborhood graph used in Sec. V-C to 50 for the first dataset and 300 for the second dataset due to different densities of point clouds. The reconstruction quality is calculated via the evaluation metric software for PCC in MPEG [46]. The Arithmetic coder [36] is used in entropy coding, and the bit rate is calculated via bit per input point (BPIP).

Point Clouds BD-BR (Y) BD-BR (U) BD-BR (V)
Longdress -3.8% -2.7% -2.8%
Loot -19.0% -17.9% -18.6%
Redandblack -5.1% -4.3% -5.0%
Soldier -14.1% -13.0% -13.5%
Andrew -6.9% -6.4% -6.0%
David -17.7% -19.3% -19.1%
Phil -5.5% -3.9% -4.9%
Richado -21.4% -23.9% -22.4%
Sarah -16.6% -17.0% -17.2%
Average -12.2% -12.0% -12.2%
TABLE I: Performance Comparison with NWGFT.
Point Clouds BD-BR (Y) BD-BR (U) BD-BR (V)
Longdress -10.0% -21.8% -18.3%
Loot -8.2% -29.0% -32.3%
Redandblack -14.8% -17.5% -25.1%
Soldier -13.7% -17.3% -19.0%
Andrew -4.4% 1.0% 3.7%
David -13.9% -13.5% -9.3%
Phil -6.6% -3.9% -3.8%
Richado -22.9% -26.0% -26.2%
Sarah -25.8% -32.1% -26.2%
Average -13.4% -17.8% -17.4%
TABLE II: Performance Comparison with RAHT.

Vi-B Experimental Results

Vi-B1 Objective Comparison

Fig. 7 shows the RD curves for different point cloud compression methods. We see that our method performs better in coding performance over NWGFT and RAHT over all the test sequences for a large range of BPIP. Specifically, compared with NWGFT, we reduce bit rate by 12.2%, 12.0% and 12.2% on average for Y, U and V components respectively, as listed in Table I. The numbers are calculated using the BD-BR [5], which quantifies the difference between two RD curves. Also, comparing the measurement points on the RD curves of the proposal and NWGFT, we notice that the proposal significantly reduces the bitrate with little quality loss, due to the efficient inter-coding mode. This verifies that the proposed optimal inter-prediction and GGFT lead to more compact representation for dynamic point clouds.

Compared with the state-of-the-art method RAHT, we reduce bit rate by 13.4%, 17.8% and 17.4% on average for Y, U and V components respectively, as presented in Table II. Although both methods are geometry based, our proposed optimal inter-prediction and GGFT fully decorrelate the temporal correlation, thus achieving higher compression ratio. In particular, for point clouds with slow motion and simple texture such as Ricardo and Sarah, the proposal achieves more than 20% bit rate reduction. Even for point clouds with richer texture information such as Andrew and Phil, the proposal achieves 4.4% and 6.6% bit rate reduction over RAHT respectively. Besides, due to the overhead in side information such as coding modes, our performance improvement at low bit rate becomes smaller for the sequence Loot.

Vi-B2 Subjective Comparison

To evaluate the subjective performance, we compare the reconstructed results of the proposed method with those of other methods at similar rates in Fig. 8. We see that the proposed algorithm not only preserves more details in regions with abundant texture, but also avoids artifacts in smooth regions. Specifically, as illustrated in Fig. 8, the reconstructed results of NWGFT and RAHT exhibit obvious blocking artifacts, while our results mitigate such artifacts significantly. Further, the proposal preserves high-frequency texture information better than NWGFT and RAHT, as shown in the reconstructed gun of Soldier.

Vii Conclusion

We propose a complete compression framework for attributes of 3D dynamic point clouds, assuming availability of geometry at both the encoder and decoder. We represent point clouds on graphs and model with GMRF, from which we derive optimal inter-prediction and predictive transforms as Generalized Graph Fourier Transforms for temporal decorrelation. Also, we remove spatial redundancy by Normal-Weighted Graph Fourier Transforms (NWGFT) in the intra-coding mode. The optimal coding mode is then determined based on rate-distortion optimization with the proposed offline-trained -Q model. Extensive experimental results show that we significantly outperform state-of-the-art point cloud compression methods, thus validating the decorrelation effectiveness of the proposed framework.

References

  • [1] 3DG (2019-03) Information technology — MPEG-I (coded representation of immersive media) — part 9: geometry-based point cloud compression. In ISO/IEC JTC1/SC29/WG11 (MPEG) output document w18179, Cited by: §II-A, §VI-A.
  • [2] A. Anis, P. A. Chou, and A. Ortega (2016) Compression of dynamic 3d point clouds using subdivisional meshes and graph wavelet transforms. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6360–6364. Cited by: §I.
  • [3] P. J. Besl and N. D. McKay (1992) Method for registration of 3-d shapes. In Sensor Fusion IV: Control Paradigms and Data Structures, Vol. 1611, pp. 586–607. Cited by: §I.
  • [4] T. Biyikoglu, J. Leydold, and P. F. Stadler (2005) Nodal domain theorems and bipartite subgraphs. Cited by: §III-B.
  • [5] G. Bjøntegaard (2008-07) Improvement of BD-PSNR model. Note: VCEG-AI11 Cited by: §VI-B1.
  • [6] Y. Chen and G. Medioni (1992) Object modelling by registration of multiple range images. Image and vision computing 10 (3), pp. 145–155. Cited by: §I.
  • [7] G. Cheung, W. Kim, A. Ortega, J. Ishida, and A. Kubota (2011) Depth map coding using graph based transform and transform domain sparsification. In Multimedia Signal Processing (MMSP), 2011 IEEE 13th International Workshop on, pp. 1–6. Cited by: §III-A.
  • [8] F. K. Chung (1996) Spectral graph theory. 92 (6), pp. 212. Cited by: §I.
  • [9] R. L. de Queiroz and P. A. Chou (2016) Compression of 3d point clouds using a region-adaptive hierarchical transform. IEEE Transactions on Image Processing 25 (8), pp. 3947–3956. Cited by: §I, §I, §I, §II-A, §VI-A.
  • [10] R. L. de Queiroz and P. A. Chou (2017) Motion-compensated compression of point cloud video. In 2017 IEEE International Conference on Image Processing (ICIP), pp. 1417–1421. Cited by: §I.
  • [11] O. Devillers and P. Gandoin (2000) Geometric compression for interactive transmission. In Proceedings Visualization 2000. VIS 2000 (Cat. No. 00CH37145), pp. 319–326. Cited by: §II.
  • [12] H. E. Egilmez, E. Pavez, and A. Ortega (2017) Graph learning from data under Laplacian and structural constraints. IEEE Journal of Selected Topics in Signal Processing 11 (6), pp. 825–841. Cited by: §IV-C.
  • [13] T. M. Eugene d’Eon and P. A. Chou (2017-Jan.) 8i voxelized full bodies, version 2 – a voxelized point cloud dataset. ISO/IEC JTC1/SC29/WG11 m40059 ISO/IEC JTC1/SC29/WG1 M74006. Cited by: Fig. 1, §VI-A.
  • [14] M. Everingham and A. Zisserman (2005) Automated detection and identification of persons in video using a coarse 3-d head model and multiple texture maps. IEE Proceedings-Vision, Image and Signal Processing 152 (6), pp. 902–910. Cited by: §V-B1.
  • [15] K. P. F.R.S. (1901) LIII. on lines and planes of closest fit to systems of points in space. Vol. 2, pp. 559–572. Cited by: §V-C.
  • [16] S. Gumhold, Z. Kami, M. Isenburg, and H. Seidel (2005) Predictive point-cloud compression.. In Siggraph Sketches, pp. 137. Cited by: §II.
  • [17] A. Haar (1910) Zur theorie der orthogonalen funktionensysteme. Mathematische Annalen 69 (3), pp. 331–371. Cited by: §II-A.
  • [18] D. K. Hammond, P. Vandergheynst, and R. Gribonval (2011) Wavelets on graphs via spectral graph theory. Applied and Computational Harmonic Analysis 30 (2), pp. 129–150. Cited by: §II-A.
  • [19] J. Han, A. Saxena, V. Melkote, and K. Rose (2012-04) Jointly optimized spatial prediction and block transform for video and image coding. IEEE Transactions on Image Processing 21 (4), pp. 1874–1884. External Links: Document, ISSN 1057-7149 Cited by: §III-C.
  • [20] W. Hu, G. Cheung, X. Li, and O. C. Au (2012-09) Depth map compression using multi-resolution graph-based transform for depth-image-based rendering. In IEEE International Conference on Image Processing, Orlando, FL, pp. 1297–1300. Cited by: §III-A.
  • [21] W. Hu, G. Cheung, A. Ortega, and O. C. Au (2015-01) Multi-resolution graph fourier transform for compression of piecewise smooth images. In IEEE Transactions on Image Processing, Vol. 24, pp. 419–33. Cited by: §III-A.
  • [22] W. Hu, G. Cheung, and A. Ortega (2015-11) Intra-prediction and generalized graph Fourier transform for image coding. IEEE Signal Processing Letters 22 (11), pp. 1913–1917. External Links: Document, ISSN 1070-9908 Cited by: §I, §III-B, §III-C, §III-C.
  • [23] W. Hu, G. Cheung, A. Ortega, and O. C. Au (2015) Multiresolution graph fourier transform for compression of piecewise smooth images. IEEE Transactions on Image Processing 24 (1), pp. 419–433. Cited by: §III-A.
  • [24] Y. Huang, J. Peng, C. J. Kuo, and M. Gopi (2008) A generic scheme for progressive point cloud coding. IEEE Transactions on Visualization and Computer Graphics 14 (2), pp. 440–453. Cited by: §I, §II-A.
  • [25] J. Kammerl, N. Blodow, R. B. Rusu, S. Gedikli, M. Beetz, and E. Steinbach (2012) Real-time compression of point cloud streams. In 2012 IEEE International Conference on Robotics and Automation, pp. 778–785. Cited by: §II.
  • [26] Z. Karni and C. Gotsman (2000) Spectral compression of mesh geometry. In Proceedings of the 27th annual conference on Computer graphics and interactive techniques, pp. 279–286. Cited by: §III-A.
  • [27] L. Le Magoarou, R. Gribonval, and N. Tremblay (2017) Approximate fast graph fourier transforms via multilayer sparse approximations. IEEE transactions on Signal and Information Processing over Networks 4 (2), pp. 407–420. Cited by: §V-B4, §V-C.
  • [28] C. Loop, Q. Cai, S. O. Escolano, and P. A. Chou (2016) Microsoft voxelized upper bodies-a voxelized point cloud dataset. ISO/IEC JTC1/SC29 Joint WG11/WG1 (MPEG/JPEG) input document m38673/M72012. Cited by: §VI-A.
  • [29] H. S. Malvar (2006) Adaptive run-length/golomb-rice encoding of quantized generalized gaussian sources with unknown statistics. In Data Compression Conference (DCC’06), pp. 23–32. Cited by: §II-A.
  • [30] K. Mammou (2017-Oct.) PCC test model category 2 v0. In ISO/IEC JTC1/SC29/WG11 (MPEG) output document N17248, Cited by: §I, §I, §I, §II-A, §II-B.
  • [31] R. Mekuria, K. Blom, and P. Cesar (2016) Design, implementation and evaluation of a point cloud codec for tele-immersive video. IEEE Transactions on Circuits and Systems for Video Technology PP (99), pp. 1–1. Cited by: §I, §I, §I, §II-A, §II-B.
  • [32] V. Morell, S. Orts, M. Cazorla, and J. Garcia-Rodriguez (2014) Geometric 3d point cloud compression. Pattern Recognition Letters 50, pp. 55–62. Cited by: §I.
  • [33] MPEG (2019-Feb.) Verbal reports from subgroups at 125th meeting. In ISO/IEC JTC1/SC29/WG11 (MPEG) output document w18110, Cited by: §I.
  • [34] T. Ochotta and D. Saupe (2004) Compression of point-based 3d models by shape-adaptive wavelet coding of multi-height fields. Cited by: §II.
  • [35] E. Pavez and A. Ortega (2016) Generalized laplacian precision matrix estimation for graph signal processing. In International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6350–6354. Cited by: §IV-C.
  • [36] J. Rissanen and G. G. Langdon (1979) Arithmetic coding. IBM Journal of research and development 23 (2), pp. 149–162. Cited by: §VI-A.
  • [37] H. Rue and L. Held (2005) Gaussian markov random fields: theory and applications. Chapman and Hall/CRC. Cited by: §I, §IV-B2.
  • [38] R. B. Rusu and S. Cousins (2011) 3D is here: point cloud library (pcl). In IEEE International Conference on Robotics and Automation (ICRA), pp. 1–4. Cited by: §I, §II.
  • [39] R. Schnabel and R. Klein (2006) Octree-based point-cloud compression.. In Eurographics Symposium on Point-Based Graphics, pp. 111–120. Cited by: §I, §II.
  • [40] G. Shen, W.-S. Kim, S. K. Narang, A. Ortega, J. Lee, and H. Wey (2010-12) Edge-adaptive transforms for efficient depth map coding. In IEEE Picture Coding Symposium, Nagoya, Japan, pp. 566–569. Cited by: §III-A.
  • [41] G. Shen, W. Kim, S. K. Narang, A. Ortega, J. Lee, and H. Wey (2010) Edge-adaptive transforms for efficient depth map coding. In IEEE Picture Coding Symposium (PCS), pp. 566–569. Cited by: §III-A.
  • [42] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst (2013)

    The emerging field of signal processing on graphs: extending high-dimensional data analysis to networks and other irregular domains

    .
    IEEE Signal Processing Magazine 30 (3), pp. 83–98. Cited by: §I, §IV-C.
  • [43] G. J. Sullivan, J. Ohm, W. Han, and T. Wiegand (2012) Overview of the high efficiency video coding (hevc) standard. IEEE Transactions on circuits and systems for video technology 22 (12), pp. 1649–1668. Cited by: §I, §II-A.
  • [44] G. J. Sullivan and T. Wiegand (1998) Rate-distortion optimization for video compression. IEEE Signal Processing Magazine 15 (6), pp. 74–90. Cited by: §V-D.
  • [45] D. Thanou, P. A. Chou, and P. Frossard (2016) Graph-based compression of dynamic 3d point cloud sequences. IEEE Transactions on Image Processing 25 (4), pp. 1765–1778. Cited by: §I, §I, §II-B.
  • [46] D. Tian (2017-04) Updates and integration of evaluation metric software for pcc. In ISO/IEC JTC1/SC29/WG11 MPEG2016/M40522, Cited by: §VI-A.
  • [47] C. Tulvan, R. Mekuria, and Z. Li (2016-06) Use cases for point cloud compression (PCC). In ISO/IEC JTC1/SC29/WG11 (MPEG) output document N16331, Cited by: §I.
  • [48] D. V. Vranic, D. Saupe, and J. Richter (2001) Tools for 3d-object retrieval: karhunen-loeve transform and spherical harmonics. In 2001 IEEE Fourth Workshop on Multimedia Signal Processing (Cat. No. 01TH8564), pp. 293–298. Cited by: §IV-B2.
  • [49] G. K. Wallace (1992) The jpeg still picture compression standard. IEEE transactions on consumer electronics 38 (1), pp. xviii–xxxiv. Cited by: §I, §II-A.
  • [50] Y. Xu, W. Hu, S. Wang, X. Zhang, S. Wang, S. Ma, and W. Gao (2018) Cluster-based point cloud coding with normal weighted graph fourier transform. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1753–1757. Cited by: §I, §I, §I, §II-A, §IV-A1, §V-C, §V, §VI-A.
  • [51] Xu,Yiqun, S. Wang, X. Zhang, Wang,Shiqi, N. Zhang, S. Ma, and W. Gao (2017) Rate-distortion optimized scan for point cloud color compression. In IEEE Visual Communications and Image Processing (VCIP), Cited by: §V-D2.
  • [52] C. Zhang, D. Florêncio, and P. A. Chou (2015) Graph signal processing-a probabilistic framework. Microsoft Res., Redmond, WA, USA, Tech. Rep. MSR-TR-2015-31. Cited by: §IV-C.
  • [53] C. Zhang, D. Florencio, and C. Loop (2014) Point cloud attribute compression with graph transform. In IEEE International Conference on Image Processing (ICIP), pp. 2066–2070. Cited by: §I, §I, §II-A, §V-B1, §V-C.
  • [54] C. Zhang and D. Florêncio (2013) Analyzing the optimality of predictive transform coding using graph-based models. IEEE Signal Processing Letters 20 (1), pp. 106–109. Cited by: §I, §III-A, §IV-B2.
  • [55] X. Zhang, S. Wang, K. Gu, W. Lin, S. Ma, and W. Gao (2017) Just-noticeable difference-based perceptual optimization for jpeg compression. IEEE Signal Processing Letters 24 (1), pp. 96–100. Cited by: §V-D.