I Introduction
With the development of depth sensing and 3D graphic technology, 3D dynamic point clouds have attracted intensive attention in the representation for real scenes or 3D objects in motion. A dynamic point cloud consists of a sequence of static point clouds, each of which is composed of a set of points, with 3D coordinates to represent the geometric information as shown in Fig. 1. Attribute information is often associated with each point to enrich its functions, such as colors and normals. Due to the efficient representation, 3D dynamic point clouds have been widely applied in various fields, such as 3D immersive telepresence, navigation for autonomous vehicles, gaming, and animation [47].
Nevertheless, the large amounts of data in 3D dynamic point clouds significantly increase the burden for transmission and storage, especially with multiple attributes on each point. Further, unlike images or videos, 3D points are acquired by directly sampling the surfaces of real objects, which usually leads to irregular point distribution. Also, the number of points in each frame mostly varies over time. Hence, it is quite challenging to represent dynamic point clouds compactly. In the face of such necessity and challenges of point cloud coding, the 3D Graphic Group (3DG) of Moving Picture Experts Group (MPEG) has started the standardization of point cloud compression (PCC)^{1}^{1}1https://mpeg.chiariglione.org/standards/mpegi/pointcloudcompression.
While many approaches have been proposed to support efficient compression of static point clouds, including geometry coding [32, 39, 38] and attribute coding [53, 50, 9], few efforts [45, 30, 2, 10] are made for dynamic point clouds. Also, existing dynamic point cloud coding methods focused more on the coding of geometry [45, 33, 10], but less on specialized intercoding tools for attributes [45, 10]. In practice, attributes are critical in providing information for specific applications, e.g., colors and normals for visualization, as well as in rendering point clouds with high quality. Hence, we focus on the attribute coding of dynamic point clouds in this paper.
Recent works on the intra
coding of point cloud attributes can be classified into two categories: 1) Codecbased point cloud compression
[31, 30], which reorders point clouds into regular samples and then deploys existing image/video coding tools (e.g., JPEG [49], HEVC [43]). However, this class of methods rely on mature image/video codecs regardless of intrinsic characteristics of 3D point clouds. 2) Geometrybased point cloud compression [24, 9, 53, 50], which designs geometrybased transforms that are tailored for irregular point clouds. Among them, Regionadaptive Hierarchical Transform (RAHT) [9] is the stateoftheart intraframe compression method for point cloud attributes, which devises a hierarchical subband transform that resembles an adaptive variation of a Haar wavelet and arithmetic coding.In order to further compress a dynamic point cloud, intercoding is required to reduce the temporal redundancy. Similar to the intracoding, there exist two major categories of exploiting the temporal correlation in point clouds: 1) Codecbased methods [30], where the intercoding of dynamic point clouds converts to that in HEVC. However, certain features of point clouds may lose during the process of projecting dynamic point clouds to videos. 2) Geometrybased methods [31, 45], which search the temporal correspondence between neighboring frames of point clouds based on the geometry information from Iterative Closest Point (ICP) [6, 3, 45] or feature matching[31]. Nevertheless, either complicated regions of point clouds are difficult to register via ICP or the temporal correlation is not fully utilized for encoding the intercoding residual, leading to suboptimal intercoding.
In order to fully exploit intrinsic temporal correlations for compact representation, we propose optimal interprediction and predictive transform coding with refined motion estimation for attributes of dynamic point clouds. Firstly, assuming the Gaussian Markov Random Fields (GMRF) model [37] and a spatiotemporal graph representation for dynamic point clouds, we derive optimal interprediction and predictive transform coding for the prediction residual, which depends on the precision matrix in the GMRF model similar to [54]. However, it is often complicated to estimate the precision matrix statistically. Instead, we interpret the precision matrix by the generalized graph Laplacian matrix^{2}^{2}2In spectral graph theory, the graph Laplacian is an algebraic representation of the connectivity and degree of a graph. There are different variants of Laplacian matrices, and the generalized graph Laplacian is one of them. The formal definition is in Sec. III. in spectral graph theory [8, 42], which is efficient to compute. The generalized graph Laplacian essentially encodes not only the spatial correlation within each frame, but also the temporal dependency across neighboring frames as the boundary condition of the current frame. This leads to a generalized variant of Graph Fourier Transform (GFT), referred to as the generalized GFT (GGFT) [22] as the optimal predictive transform. The GGFT is an adaptive transform computed from a graph underlying the signal, which proves to optimally decorrelate the dynamic point cloud both spatially and temporally under the assumed GMRF model.
Secondly, we address the challenge of searching the temporal correspondence between neighboring frames of irregular point clouds via the proposed refined motion estimation. Specifically, we first segment each frame into clusters based on the spatial correlation of geometry, which serve as the processing unit for efficient representation. For the target cluster in the current frame, in order to address the challenge that collocated clusters exhibit irregular shape and different number of points, we build a bounding cube around and register with the reference frame via ICP. Different from [31], we register a cluster based on a larger temporally collocated cube which provides sufficient feasible searching space of corresponding points. Further, we search the temporal correspondence based on pointtopoint Euclidean distance. This yields reference points for the subsequent interprediction and transform, which is able to handle complicated regions.
Finally, we design a complete framework of dynamic point cloud attribute compression with an offlinetrained Q model for ratedistortion optimization. Specifically, the framework consists of two coding modes to fully exploit spatialtemporal dependencies, including our previous intracoding [50] as one mode and the proposed intercoding as the other, both of which are based on GFTs. We determine the optimal coding mode based on ratedistortion optimization to achieve the best tradeoff between coding rates and attribute distortions. Further, we establish a Q model to derive the Lagrange multiplier offline for best tradeoff, according to the statistics of attributes in dynamic point clouds. Experimental results show that we outperform the stateoftheart Regionadaptive Hierarchical Transform method [9] by 13.4% in bitrate reduction on average for the luminance component. This validates the effectiveness of the proposed refined motion estimation, optimal interprediction and predictive GGFT coding for the spatial and temporal decorrelation of dynamic point clouds.
In summary, the main contributions of our work include:

We propose optimal interprediction and predictive transform coding assuming the GMRF model for attributes of dynamic point clouds. We derive that the optimal predictive transform is the GGFT, which fully decorrelates the interprediction residual.

We propose refined motion estimation via efficient registration prior to interprediction, which addresses the challenge of searching the temporal correspondence between neighboring frames of irregular point clouds.

We present a complete framework of dynamic point cloud attribute compression, consisting of our previous intracoding and the proposed intercoding. The optimal coding mode is determined from ratedistortion optimization with the proposed offlinetrained Q model.
The remainder of this paper is organized as follows. Sec. II reviews related works on point cloud compression. Sec. III introduces relevant concepts in spectral graph theory and the generalized GFT. Next, we elaborate on the proposed optimal interprediction and predictive transform in Sec. IV. Then we discuss the complete coding framework with refined motion estimation and Q model in Sec. V. Experimental results and conclusions are presented in Sec. VI and Sec. VII, respectively.
Ii Related Work
Previous point cloud compression focuses more on the coding of geometry, such as the spinning tree [16], the kdtree structure [11, 34], and the octree [38, 39, 25], which organize unstructured point clouds into regular grids. Among them, the octree approach is widely adopted nowadays, which is a counterpart of quadtree in 2D images. Recently, the coding of attributes is drawing increasing attention. We discuss previous works in intracoding and intercoding of attributes in order.
Iia Intracoding of point cloud attributes
The intracoding of point cloud attributes can be divided into two categories: 1) Codecbased methods, which leverage existing image/video codecs by projecting different viewpoints of point clouds onto depth maps and texture maps. 2) Geometrybased methods, which design transforms based on the geometry and perform transform coding on attributes.
In the first category, Mekuria et al. employ JPEG [49]
to encode color information which is padded into a map via scanning
[31]. Mammou et al. [30] introduce the videobased point cloud compression (VPCC), encoding the projected color and depth images via HEVC [43], which improves the performance further. However, the scanning or projection process inevitably introduces losses in the structure and details of point clouds. Also, the characteristics of point clouds are not taken into consideration when encoding via existing image/video codecs.In the second category, Huang et al. [24] explore the color attribute compression and verify that there is considerable redundancy in color representation within octree structure. GPCC [1] is proposed as an open platform by MPEG3DG, and Regionadaptive Hierarchical Transform (RAHT) [9] is adopted as a tool of attribute coding. Specifically, Queiroz et al. compress the color information of point clouds via a hierarchical subband transform [9] that resembles an adaptive variation of a Haar wavelet[17]. They use an entropy coder to encode RAHT coefficients based on arithmetic coding (AC) and adaptive runlength GolombRice encoding (RLGR) [29] further, which is a much less complex entropy coder. Zhang et al. [53] propose to encode attributes via GFT [18] over each subcloud. However, the edge weight allocation for graphs is inefficient since it ignores the characteristics of neighboring points. To address this problem, our previous work proposes a NormalWeighted GFT (NWGFT) [50]. We first cluster the point cloud according to the point coordinate distribution, which makes each subcloud more correlated within. Then we design a novel edge weight allocation method for constructing GFT by exploiting the similarity in normals of points, so that the correlation within each subcloud is further removed.
IiB Intercoding of point cloud attributes
Similarly, the intercoding can be classified into two categories: 1) Codecbased methods [30], which leverage existing video codecs (e.g., HEVC) but may lose details when projecting dynamic point clouds to videos. 2) Geometrybased methods, which exploit the temporal correlation by registering or matching neighboring point clouds based on geometry information. Mekuria et al. [31]
propose to register neighboring frames of point clouds via ICP, and encode the transformation matrix and other support information to reduce the bit rate. However, complicated regions of point clouds are difficult to register via ICP. Moreover, they deploy ICP to search for corresponding 3D blocks if two adjacent frames satisfy certain conditions such as deviation in color variance. Otherwise, they resort to intracoding, thus limiting the use of ICP. In
[45], Thanou et al. represent a dynamic point cloud by a set of graphs and match features over them to perform motion estimation and predictive coding. However, the feature matching step is computationally expensive. Moreover, the intercoding residual is encoded based on the spatial correlation within each frame instead of the temporal correlation, leading to suboptimal intercoding.Iii Background in Graph Fourier Transform
Iiia Graph, Graph Laplacian and Graph Fourier Transforms
We consider an undirected graph composed of a vertex set of cardinality , an edge set connecting vertices, and a weighted adjacency matrix . is a real symmetric matrix, where is the weight assigned to the edge connecting vertices and . We assume nonnegative weights, i.e., .
The graph Laplacian matrix is then defined from the adjacency matrix. Among different variants of Laplacian matrices, the combinatorial graph Laplacian used in [40, 20, 21] is defined as , where is the degree matrix—a diagonal matrix with .
The combinatorial graph Laplacian is real and symmetric, which means it admits a complete set of orthonormal eigenvectors. The GFT basis
is then the eigenvector set of the Laplacian matrix. For a given signal defined on the graph (i.e., graph signal), the formal definition of its GFT is(1) 
The inverse GFT follows as
(2) 
IiiB The Generalized Graph Laplacian
A generalized Laplacian (or discrete Schrodinger operator) is a symmetric matrix with nonpositive offdiagonal entries [4]. Defining a diagonal matrix , one can rewrite a generalized Laplacian as
(3) 
where the diagonal entries of can be viewed as a potential defined on the vertices, e.g., a boundary condition [22]. It can also be written as
(4) 
where is the degree matrix of the generalized Laplacian:
(5) 
IiiC The Generalized Graph Fourier Transforms
The GGFT is firstly proposed in [22] as an image transform, which is optimized for intraprediction residual in images. Following the definition of GFT, the basis of the GGFT is the eigenvector set of the generalized graph Laplacian . The GGFT is then defined as
(6) 
The inverse GGFT follows as
(7) 
In the generalized graph Laplacian in [22], the diagonal entries correspond to vertices at image block boundaries, with an extra weight added as a function of the expected inaccuracy of intraprediction. It is analyzed in [22] that GGFT can be viewed as extensions of widely used transforms (namely, the discrete cosine transform, DCT, and the asymmetric discrete sine transform, ADST [19]).
Iv Optimal Interprediction and Transform
In this section, we derive the optimal interprediction and transform under GMRF modeling of a dynamic point cloud sequence. We first propose spatiotemporal graph construction, which serves as the underlying graph for GMRF. The optimal interprediction and transform is then deduced from conditional dependencies encoded in the precision matrix in GMRF. Due to the estimation limitation of the precision matrix in practice, we interpret the precision matrix by the generalized graph Laplacian, which leads to the final optimal interprediction and predictive GGFT.
Iva Proposed Spatiotemporal Graph Construction
Given an input point cloud sequence , where is the number of frames in , we consider two neighboring frames and . Assuming availability of geometry at both the encoder and decoder as well as certain correlation between geometry and attributes, we partition the target frame into clusters based on geometry in order to efficiently exploit the temporal correlation. Then we perform motion estimation to acquire pointtopoint correspondence between each cluster in and points in based on geometry. The details of geometrybased clustering and motion estimation will be discussed in Sec. VB, while we focus on the graph construction over each cluster here.
Considering the target cluster in , where is the cardinality of points in , we assume the corresponding points in forms a set . We construct a spatiotemporal graph over and to encode spatiotemporal correlations, which prepares the ground for the subsequent interprediction and predictive transform. Specifically, we treat each point as a vertex in the graph, and build connections including spatial connectivities within and temporal connectivities between corresponding points of and . We discuss the spatial and temporal connectivities in order.
IvA1 Spatial graph connectivities
We construct spatial connectivities in based on global and local features as in our previous work [50]. We employ the Euclidean distance between 3D coordinates as the global feature to decide
nearestneighbors of each point, which constitute a local surface. Then we estimate the normal vector of each surface as the local feature to compute edge weights in a Gaussian kernel. More details will be described in Sec.
VC.IvA2 Temporal graph connectivities
We build temporal connections based on pointtopoint correspondence acquired from refined motion estimation. Specifically, we connect each point in the target cluster of to its corresponding point in of . While it is possible to set weights of temporal edges as a Gaussian kernel of the distance between 3D coordinates of corresponding points, we assign all the temporal edge weights as for simplicity.
IvB Optimal Prediction and Transform under GMRF
We propose optimal prediction and transform based on the constructed spatiotemporal graph. In particular, we first derive optimal prediction and transform statistically under the GMRF modeling of point clouds.
IvB1 Preliminaries in Gaussian Markov Random Field
We model the spatiotemporal correlation of dynamic point clouds on GMRFs. The formal definition of a GMRF is as follows.
Definition A random vector is called a GMRF with respect to the graph with mean and a precision matrix (positive definite), if and only if its density has the form
(8) 
and
(9) 
This definition infers that a signal
modeled by a GMRF follows a multivariate Gaussian distribution, with the mean vector
and the covariance matrix —the inverse of . The precision matrix is deployed in the definition of GMRF for its conditional interpretations:(10)  
(11) 
where denotes all elements in except , and represents all nodes that are neighbors of in the graph. (10) and (11) interpret the conditional expectation and precision of given all other elements based on parameters of the GMRF, which will be leveraged to derive an optimal interprediction and predictive transform.
IvB2 Optimal prediction and transform
We assume the attributes of and follow the GMRF model, with mean and , and the precision matrix and respectively.
Given the reference set , the interprediction problem is essentially predicting from . As discussed in [54], the optimal interprediction is the conditional expectation of given under the GMRF model. Any other predictor will yield nonzero expected prediction error. For the resulting prediction residuals, the optimal predictive transform basis is the eigenvector set of the precision matrix , i.e., the KarhunenLoeve Transform (KLT) [48], to optimally decorrelate under GMRF.
In particular, we group and into , with mean and precision matrix . Then we partition the precision matrix as
(12) 
where , and .
As mentioned in [37], is also a GMRF with mean and precision matrix , where
(13)  
(14) 
Next, we discuss the specific form of and based on our graph construction. According to (9), represents the temporal connectivity between and . Since we assign the edge weight between each pair of temporally corresponding points in and as , we have
(15) 
where
is an identity matrix, and
(16) 
Further, assuming zeromean for both and , i.e., , we have
(18) 
IvC Proposed InterPrediction and Predictive GGFT
The derived optimal prediction and transform in (18) and (19) under GMRF depends on the precision matrix, which is however often difficult to estimate given a single observation of a dynamic point cloud sequence. Instead, we interpret the precision matrix by the generalized graph Laplacian matrix, and thus deploy the generalized graph Laplacian for the optimal interprediction and predictive transform.
As discussed in [35, 12, 52], the precision matrix in general GMRF can be interpreted by a generalized graph Laplacian , i.e.,
(20) 
Combining (18), (19) and (20), we derive the final optimal interprediction as
(21) 
where denotes the generalized graph Laplacian for . As the graph Laplacian is a highpass filter [42], corresponds to a lowpass filter. Hence, (21) indicates that the prediction of is a lowpass filtered version of .
Accordingly, the optimal predictive transform for the resulting prediction residual is the GGFT computed from
(22) 
By incorporating temporal dependencies between adjacent frames, GGFT is optimal in terms of full spatiotemporal decorrelation.
Further, as includes both spatial connectivities within as well as temporal connectivities between and as presented in (16), we decompose it into the following for simpler computation:
(23) 
where encodes the spatial connectivities of and encodes the temporal connectivities to . Here corresponds to in (3), which can be viewed as the boundary condition of each cluster in the current frame.
V Proposed Coding Framework
Based on the derived optimal interprediction and predictive transform, we present a complete coding framework for attributes of 3D dynamic point clouds, including the proposed intercoding and our previously proposed intracoding [50] as shown in Fig. 2. Further, we design coding mode decision between the intermode and intramode for RateDistortion Optimization. Note that, we assume the geometry information of point clouds are losslessly coded and available at both the encoder and decoder. We discuss the procedures of intercoding and intracoding respectively as follows, as well as the designed coding mode decision.
Va Preprocessing: Voxelization
Different point clouds have various scales of size and precision, which inevitably affects the inter/intracoding process as we construct graphs based on local and global features of geometry. For the sake of generality, we preprocess the input point cloud via voxelization, which maps all the points into bins of dimension and thus leads to point clouds with unified scale in coordinates. In our experiments,
is set to 4096. Specifically, a bin or a voxel is regarded occupied, if it contains at least one point, otherwise it is unoccupied. Then the geometry information of voxels is represented by a set of triples. The attribute (e.g., color) of each voxel is calculated as the average attribute value of all the points within the voxel. At the decoder, we perform devoxelization on the decoded point clouds so as to comply with the objective evaluation metric of MPEG and keep the geometric scale of point clouds unchanged.
VB Optimized InterCoding
As shown in Fig. 3, the optimized intercoding consists of four steps: 1) segment each frame of point cloud into clusters as the processing unit based on geometry; 2) search the temporal correspondence between each cluster in the current frame and a set of points in the previous frame, i.e., refined motion estimation; 3) construct a spatiotemporal graph over each cluster in the current frame and compute its generalized Laplacian matrix; 4) perform optimal interprediction and predictive transform. We discuss the four steps as follows, with emphasis on the proposed refined motion estimation.
VB1 Geometry clustering
To mitigate the computation cost for eigendecomposition of the Laplacian matrix, we partition the input point cloud into small clusters. Instead of uniform spatial partition of point clouds[53] which would create many isolated subclouds if the point cloud is sparse, we employ means clustering [14] based on geometry. The point cloud is divided into clusters and each cluster contains points. To balance the coding performance and computational complexity, we set the average number of points in each cluster to .
VB2 Refined Motion Estimation
In order to efficiently exploit the temporal correlation between two neighboring frames and , we propose to register the target cluster in with the previous frame via ICP and then find pointtopoint correspondence in registered sets.
As demonstrated in Fig. 4, to reduce the registration complexity, we first form a bounding box around and expand it with a certain percentage ( in our experiments) to a bounding box . We then set a bounding box in , which is collocated with and will be employed to find reference points in . This serves as a refinement step for motion estimation.
Further, we acquire pointtopoint correspondence between the target cluster in and the reference bounding box in . Specifically, the correspondence of each point in is its nearest point in in terms of Euclidean distance. The resulting forms the corresponding set in , denoted as . As such, we acquire the temporal correspondence between the reference frame and the current frame .
Note that, while we accommodate neighboring frames and with different number of points, the corresponding sets and contain the same number of points by the proposed refined motion estimation.
VB3 Graph Construction
VB4 InterPrediction and Predictive Transform
Following graph construction, we compute the generalized Laplacian accordingly as in (23). Then we calculate the optimal prediction via (21), acquire the residual signal and compute the optimal predictive GGFT for the residual via (22). Note that, we may employ the fast GFT algorithm in [27] to accelerate the computation of GGFT. The resulting transform coefficients are quantized, entropy encoded and transmitted to the decoder.
VC IntraCoding with NormalWeighted GFT
As illustrated in Fig. 5, we adopt our previous algorithm in [50] for intracoding. The key idea is to capture structural similarities by a Gaussian kernel of normals as edge weights, from which the GFT is computed for compact representation. Specifically, we first perform geometry clustering as in the optimized intercoding, and then calculate NormalWeighted Graph Fourier Transform (NWGFT) for each cluster.
In NWGFT, we propose a novel edge weight allocation method for graph construction by exploiting the similarity of normal vectors in relative local space to further remove the correlation of each cluster. Specifically, we first build a neighborhood graph, which connects all points whose pairwise distances are smaller than . The choice of is dependent on the density of the point cloud for capturing the local structure, which we will detail in the experimental setup. Next, we compute the normal vector of a nearestneighbor local space around each point . The normal is estimated by decomposing the covariance matrix [15], which serves as a local feature. The edge weight between and is then assigned as:
(24) 
where is the angle between two normal vectors and , and is a parameter in the Gaussian kernel. In our experiments, and are set to and , respectively. The proposed edge weight function defined in (24) is more robust than commonly used Euclidean distance between coordinates [53], by considering features of each point and its neighborhood via normals.
Based on the above graph construction, we compute a combinatorial graph Laplacian to acquire GFT basis for each cluster. Similar to the intercoding, we may deploy fast GFT [27] to reduce the computation complexity of GFT. The resulting transform coefficients are quantized, entropy encoded and transmitted to the decoder.
VD Coding Mode Decision
Similar to traditional video coding, given the target bit rate, the goal of the attribute compression of dynamic point clouds is to convey the attribute information with minimum possible distortion. In order to achieve the best ratedistortion (RD) performance, a coding mode should be determined from the intra and inter modes to reach the best balance between rate and distortion. This can be cast into the classical RD framework, which is formulated as pursuing the best quality under the limitation of a given target rate [44, 55],
(25) 
where indicates the target bits, and denote the distortion and coding bits with the th coding mode respectively. This constrained problem can be converted into the unconstrained Lagrangian ratedistortion optimization problem as
(26) 
where is the RD cost of mode , and is the Lagrange multiplier which controls the tradeoff between rate and distortion. In particular, is determined by the quantization factor , leading to a Q model. Given , the optimal mode with the lowest RD cost can be identified. We discuss the RD calculation and Q model in detail below.
VD1 Rate and Distortion Calculation
We calculate the coding rate of mode in terms of bytes per voxel after encoding the attribute of a frame. The distortion is then obtained using the average of Mean Squared Error (MSE) in the YUV space:
(27) 
where , and denote the distortion between the original and reconstructed frames in components Y, U and V, respectively.
VD2 Q model
In traditional image and video compression, the Lagrange multiplier as a function of can be offline trained from the statistics of images or videos. Analogously, in the context of point cloud compression, we learn an appropriate Q model for efficient RD optimization. By setting the derivative of in (26) with respect to the quality factor to 0,
(28) 
we have
(29) 
This implies that characterizes the slope of the ratedistortion curve. In our previous work [51], we present the ratedistortion curves of different static point clouds, and show that the ratedistortion relationships of various static point clouds are quite close. Therefore, we offline derive the Q model based on statistics of point clouds. By discretizing the rate and distortion points in (29), we approximate the slope of the ratedistortion curve using neighboring rate and distortion points as
(30) 
Given different dynamic point clouds, the relationships between and quality factor for these sequences are plotted in Fig. 6. We acquire a best approximation of with a power function of as
(31) 
where and .
Based on the RD optimization and Q model, the best mode for each cluster can be determined.
Vi Experimental Results
Via Experimental Setup
To validate the proposed complete framework for attribute compression of dynamic 3D point clouds, we compare with two competitive methods: 1) RegionAdaptive Hierarchical Transform (RAHT) ^{3}^{3}3https://github.com/digitalivp/RAHT. [9], which has been adopted in geometrybased point cloud compression (GPCC)^{4}^{4}4GPCC is a widely deployed opensource point cloud compression API introduced by 3D Graphics (3DG) group of MPEG. and is the stateoftheart method for attribute coding of static point clouds [1]; 2) our previous intracoding method NormalWeighted Graph Fourier Transform (NWGFT) [50]. Both RAHT and NWGFT are proposed for attribute coding of static point clouds, thus we perform them on each frame of a dynamic point cloud sequence separately for comparison. Also, both methods are geometrybased as our proposal for fair comparison.
We conduct experiments on two sources of datasets, including 4 MPEG sequences (Longdress, Loot, Redandblack and Soldier) from [13] and 5 MSR sequences (Andrew9, David9, Phil9, Ricardo9 and Sarah9) from [28]. For the convenience of experiments, 16 frames are selected from each sequence and the group of picture (GOP) is set to 8. Lowdelay P (LDP) configuration is adopted in our experiments. We set the parameter in the neighborhood graph used in Sec. VC to 50 for the first dataset and 300 for the second dataset due to different densities of point clouds. The reconstruction quality is calculated via the evaluation metric software for PCC in MPEG [46]. The Arithmetic coder [36] is used in entropy coding, and the bit rate is calculated via bit per input point (BPIP).
Point Clouds  BDBR (Y)  BDBR (U)  BDBR (V) 

Longdress  3.8%  2.7%  2.8% 
Loot  19.0%  17.9%  18.6% 
Redandblack  5.1%  4.3%  5.0% 
Soldier  14.1%  13.0%  13.5% 
Andrew  6.9%  6.4%  6.0% 
David  17.7%  19.3%  19.1% 
Phil  5.5%  3.9%  4.9% 
Richado  21.4%  23.9%  22.4% 
Sarah  16.6%  17.0%  17.2% 
Average  12.2%  12.0%  12.2% 
Point Clouds  BDBR (Y)  BDBR (U)  BDBR (V) 

Longdress  10.0%  21.8%  18.3% 
Loot  8.2%  29.0%  32.3% 
Redandblack  14.8%  17.5%  25.1% 
Soldier  13.7%  17.3%  19.0% 
Andrew  4.4%  1.0%  3.7% 
David  13.9%  13.5%  9.3% 
Phil  6.6%  3.9%  3.8% 
Richado  22.9%  26.0%  26.2% 
Sarah  25.8%  32.1%  26.2% 
Average  13.4%  17.8%  17.4% 
ViB Experimental Results
ViB1 Objective Comparison
Fig. 7 shows the RD curves for different point cloud compression methods. We see that our method performs better in coding performance over NWGFT and RAHT over all the test sequences for a large range of BPIP. Specifically, compared with NWGFT, we reduce bit rate by 12.2%, 12.0% and 12.2% on average for Y, U and V components respectively, as listed in Table I. The numbers are calculated using the BDBR [5], which quantifies the difference between two RD curves. Also, comparing the measurement points on the RD curves of the proposal and NWGFT, we notice that the proposal significantly reduces the bitrate with little quality loss, due to the efficient intercoding mode. This verifies that the proposed optimal interprediction and GGFT lead to more compact representation for dynamic point clouds.
Compared with the stateoftheart method RAHT, we reduce bit rate by 13.4%, 17.8% and 17.4% on average for Y, U and V components respectively, as presented in Table II. Although both methods are geometry based, our proposed optimal interprediction and GGFT fully decorrelate the temporal correlation, thus achieving higher compression ratio. In particular, for point clouds with slow motion and simple texture such as Ricardo and Sarah, the proposal achieves more than 20% bit rate reduction. Even for point clouds with richer texture information such as Andrew and Phil, the proposal achieves 4.4% and 6.6% bit rate reduction over RAHT respectively. Besides, due to the overhead in side information such as coding modes, our performance improvement at low bit rate becomes smaller for the sequence Loot.
ViB2 Subjective Comparison
To evaluate the subjective performance, we compare the reconstructed results of the proposed method with those of other methods at similar rates in Fig. 8. We see that the proposed algorithm not only preserves more details in regions with abundant texture, but also avoids artifacts in smooth regions. Specifically, as illustrated in Fig. 8, the reconstructed results of NWGFT and RAHT exhibit obvious blocking artifacts, while our results mitigate such artifacts significantly. Further, the proposal preserves highfrequency texture information better than NWGFT and RAHT, as shown in the reconstructed gun of Soldier.
Vii Conclusion
We propose a complete compression framework for attributes of 3D dynamic point clouds, assuming availability of geometry at both the encoder and decoder. We represent point clouds on graphs and model with GMRF, from which we derive optimal interprediction and predictive transforms as Generalized Graph Fourier Transforms for temporal decorrelation. Also, we remove spatial redundancy by NormalWeighted Graph Fourier Transforms (NWGFT) in the intracoding mode. The optimal coding mode is then determined based on ratedistortion optimization with the proposed offlinetrained Q model. Extensive experimental results show that we significantly outperform stateoftheart point cloud compression methods, thus validating the decorrelation effectiveness of the proposed framework.
References
 [1] (201903) Information technology — MPEGI (coded representation of immersive media) — part 9: geometrybased point cloud compression. In ISO/IEC JTC1/SC29/WG11 (MPEG) output document w18179, Cited by: §IIA, §VIA.
 [2] (2016) Compression of dynamic 3d point clouds using subdivisional meshes and graph wavelet transforms. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6360–6364. Cited by: §I.
 [3] (1992) Method for registration of 3d shapes. In Sensor Fusion IV: Control Paradigms and Data Structures, Vol. 1611, pp. 586–607. Cited by: §I.
 [4] (2005) Nodal domain theorems and bipartite subgraphs. Cited by: §IIIB.
 [5] (200807) Improvement of BDPSNR model. Note: VCEGAI11 Cited by: §VIB1.
 [6] (1992) Object modelling by registration of multiple range images. Image and vision computing 10 (3), pp. 145–155. Cited by: §I.
 [7] (2011) Depth map coding using graph based transform and transform domain sparsification. In Multimedia Signal Processing (MMSP), 2011 IEEE 13th International Workshop on, pp. 1–6. Cited by: §IIIA.
 [8] (1996) Spectral graph theory. 92 (6), pp. 212. Cited by: §I.
 [9] (2016) Compression of 3d point clouds using a regionadaptive hierarchical transform. IEEE Transactions on Image Processing 25 (8), pp. 3947–3956. Cited by: §I, §I, §I, §IIA, §VIA.
 [10] (2017) Motioncompensated compression of point cloud video. In 2017 IEEE International Conference on Image Processing (ICIP), pp. 1417–1421. Cited by: §I.
 [11] (2000) Geometric compression for interactive transmission. In Proceedings Visualization 2000. VIS 2000 (Cat. No. 00CH37145), pp. 319–326. Cited by: §II.
 [12] (2017) Graph learning from data under Laplacian and structural constraints. IEEE Journal of Selected Topics in Signal Processing 11 (6), pp. 825–841. Cited by: §IVC.
 [13] (2017Jan.) 8i voxelized full bodies, version 2 – a voxelized point cloud dataset. ISO/IEC JTC1/SC29/WG11 m40059 ISO/IEC JTC1/SC29/WG1 M74006. Cited by: Fig. 1, §VIA.
 [14] (2005) Automated detection and identification of persons in video using a coarse 3d head model and multiple texture maps. IEE ProceedingsVision, Image and Signal Processing 152 (6), pp. 902–910. Cited by: §VB1.
 [15] (1901) LIII. on lines and planes of closest fit to systems of points in space. Vol. 2, pp. 559–572. Cited by: §VC.
 [16] (2005) Predictive pointcloud compression.. In Siggraph Sketches, pp. 137. Cited by: §II.
 [17] (1910) Zur theorie der orthogonalen funktionensysteme. Mathematische Annalen 69 (3), pp. 331–371. Cited by: §IIA.
 [18] (2011) Wavelets on graphs via spectral graph theory. Applied and Computational Harmonic Analysis 30 (2), pp. 129–150. Cited by: §IIA.
 [19] (201204) Jointly optimized spatial prediction and block transform for video and image coding. IEEE Transactions on Image Processing 21 (4), pp. 1874–1884. External Links: Document, ISSN 10577149 Cited by: §IIIC.
 [20] (201209) Depth map compression using multiresolution graphbased transform for depthimagebased rendering. In IEEE International Conference on Image Processing, Orlando, FL, pp. 1297–1300. Cited by: §IIIA.
 [21] (201501) Multiresolution graph fourier transform for compression of piecewise smooth images. In IEEE Transactions on Image Processing, Vol. 24, pp. 419–33. Cited by: §IIIA.
 [22] (201511) Intraprediction and generalized graph Fourier transform for image coding. IEEE Signal Processing Letters 22 (11), pp. 1913–1917. External Links: Document, ISSN 10709908 Cited by: §I, §IIIB, §IIIC, §IIIC.
 [23] (2015) Multiresolution graph fourier transform for compression of piecewise smooth images. IEEE Transactions on Image Processing 24 (1), pp. 419–433. Cited by: §IIIA.
 [24] (2008) A generic scheme for progressive point cloud coding. IEEE Transactions on Visualization and Computer Graphics 14 (2), pp. 440–453. Cited by: §I, §IIA.
 [25] (2012) Realtime compression of point cloud streams. In 2012 IEEE International Conference on Robotics and Automation, pp. 778–785. Cited by: §II.
 [26] (2000) Spectral compression of mesh geometry. In Proceedings of the 27th annual conference on Computer graphics and interactive techniques, pp. 279–286. Cited by: §IIIA.
 [27] (2017) Approximate fast graph fourier transforms via multilayer sparse approximations. IEEE transactions on Signal and Information Processing over Networks 4 (2), pp. 407–420. Cited by: §VB4, §VC.
 [28] (2016) Microsoft voxelized upper bodiesa voxelized point cloud dataset. ISO/IEC JTC1/SC29 Joint WG11/WG1 (MPEG/JPEG) input document m38673/M72012. Cited by: §VIA.
 [29] (2006) Adaptive runlength/golombrice encoding of quantized generalized gaussian sources with unknown statistics. In Data Compression Conference (DCC’06), pp. 23–32. Cited by: §IIA.
 [30] (2017Oct.) PCC test model category 2 v0. In ISO/IEC JTC1/SC29/WG11 (MPEG) output document N17248, Cited by: §I, §I, §I, §IIA, §IIB.
 [31] (2016) Design, implementation and evaluation of a point cloud codec for teleimmersive video. IEEE Transactions on Circuits and Systems for Video Technology PP (99), pp. 1–1. Cited by: §I, §I, §I, §IIA, §IIB.
 [32] (2014) Geometric 3d point cloud compression. Pattern Recognition Letters 50, pp. 55–62. Cited by: §I.
 [33] (2019Feb.) Verbal reports from subgroups at 125th meeting. In ISO/IEC JTC1/SC29/WG11 (MPEG) output document w18110, Cited by: §I.
 [34] (2004) Compression of pointbased 3d models by shapeadaptive wavelet coding of multiheight fields. Cited by: §II.
 [35] (2016) Generalized laplacian precision matrix estimation for graph signal processing. In International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6350–6354. Cited by: §IVC.
 [36] (1979) Arithmetic coding. IBM Journal of research and development 23 (2), pp. 149–162. Cited by: §VIA.
 [37] (2005) Gaussian markov random fields: theory and applications. Chapman and Hall/CRC. Cited by: §I, §IVB2.
 [38] (2011) 3D is here: point cloud library (pcl). In IEEE International Conference on Robotics and Automation (ICRA), pp. 1–4. Cited by: §I, §II.
 [39] (2006) Octreebased pointcloud compression.. In Eurographics Symposium on PointBased Graphics, pp. 111–120. Cited by: §I, §II.
 [40] (201012) Edgeadaptive transforms for efficient depth map coding. In IEEE Picture Coding Symposium, Nagoya, Japan, pp. 566–569. Cited by: §IIIA.
 [41] (2010) Edgeadaptive transforms for efficient depth map coding. In IEEE Picture Coding Symposium (PCS), pp. 566–569. Cited by: §IIIA.

[42]
(2013)
The emerging field of signal processing on graphs: extending highdimensional data analysis to networks and other irregular domains
. IEEE Signal Processing Magazine 30 (3), pp. 83–98. Cited by: §I, §IVC.  [43] (2012) Overview of the high efficiency video coding (hevc) standard. IEEE Transactions on circuits and systems for video technology 22 (12), pp. 1649–1668. Cited by: §I, §IIA.
 [44] (1998) Ratedistortion optimization for video compression. IEEE Signal Processing Magazine 15 (6), pp. 74–90. Cited by: §VD.
 [45] (2016) Graphbased compression of dynamic 3d point cloud sequences. IEEE Transactions on Image Processing 25 (4), pp. 1765–1778. Cited by: §I, §I, §IIB.
 [46] (201704) Updates and integration of evaluation metric software for pcc. In ISO/IEC JTC1/SC29/WG11 MPEG2016/M40522, Cited by: §VIA.
 [47] (201606) Use cases for point cloud compression (PCC). In ISO/IEC JTC1/SC29/WG11 (MPEG) output document N16331, Cited by: §I.
 [48] (2001) Tools for 3dobject retrieval: karhunenloeve transform and spherical harmonics. In 2001 IEEE Fourth Workshop on Multimedia Signal Processing (Cat. No. 01TH8564), pp. 293–298. Cited by: §IVB2.
 [49] (1992) The jpeg still picture compression standard. IEEE transactions on consumer electronics 38 (1), pp. xviii–xxxiv. Cited by: §I, §IIA.
 [50] (2018) Clusterbased point cloud coding with normal weighted graph fourier transform. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1753–1757. Cited by: §I, §I, §I, §IIA, §IVA1, §VC, §V, §VIA.
 [51] (2017) Ratedistortion optimized scan for point cloud color compression. In IEEE Visual Communications and Image Processing (VCIP), Cited by: §VD2.
 [52] (2015) Graph signal processinga probabilistic framework. Microsoft Res., Redmond, WA, USA, Tech. Rep. MSRTR201531. Cited by: §IVC.
 [53] (2014) Point cloud attribute compression with graph transform. In IEEE International Conference on Image Processing (ICIP), pp. 2066–2070. Cited by: §I, §I, §IIA, §VB1, §VC.
 [54] (2013) Analyzing the optimality of predictive transform coding using graphbased models. IEEE Signal Processing Letters 20 (1), pp. 106–109. Cited by: §I, §IIIA, §IVB2.
 [55] (2017) Justnoticeable differencebased perceptual optimization for jpeg compression. IEEE Signal Processing Letters 24 (1), pp. 96–100. Cited by: §VD.
Comments
There are no comments yet.