1 Introduction
Driven by applications in virtual and augmented reality, remote sensing, and autonomous vehicles, it is now possible to capture, in real time and at low cost, time varying 3D scenes, public spaces with moving objects, and people. The preferred representation for such data are 3D point clouds, which consist of 1) a list of 3D point coordinates, and 2) attributes associated with those coordinates, such as color. In many applications, large point clouds need to be compressed for storage and transmission, leading to the recent development of a standard by the moving pictures expert group (MPEG) [15].
We propose a new transform for attribute compression, which often takes up more than half of the overall bit budget for typical point clouds. Motivated by transforms used in image, video and point cloud compression [19, 1, 7], we construct our transform with the goal of achieving three fundamental properties: 1) orthogonality, 2) a constant basis function, and 3) low complexity. Orthogonality ensures that errors in the transform and point cloud attribute domains are equal. A constant basis function guarantees that an attribute with the same value at all points (the smoothest signal) has the most compact representation (sparse signal) in the transform domain. Finally, for the transform to scale to point clouds with a large number of points, , we require a complexity of
, instead of a naive implementation that uses matrix vector products, which would require
complexity, if the transform is available explicitly, or , if it has to be obtained via eigendecomposition.We propose the Region Adaptive Graph Fourier Transform (RAGFT), where points are organized as a set of nested partitions represented by a tree. Leaf nodes represent points in the original point cloud, while each internal node represents all points within the corresponding subtree (see Fig. 1). The RAGFT is a multiresolution transform formed by combining spatially localized block transforms, where each block represents a cube in 3D space. Resolution levels are determined by the levels of the tree, with higher resolution corresponding to the deepest level (representing single points), and the coarsest resolution corresponding to the root (representing all points). At each resolution level, attributes belonging to the same group (points that have the same parent in the tree) are passed through an orthogonal transform for decorrelation, each block transform produces a single approximation (low pass) coefficient, and several detail (high pass) coefficients. The approximation coefficients are promoted to a lower resolution level where the same process can be repeated until reaching the root^{1}^{1}1This process can also stop before reaching the root, so that multiple subtrees, each with its own DC coefficient, are stored. This may be more efficient for large point clouds where there is limited correlation across subtrees. .
Since each internal node in the tree represents a cluster, possibly containing a different number of points, the block transforms should incorporate the relative importance of the nodes, based on their respective number of descendants. To address this issue, we propose a new graph Fourier transform (GFT) given by the eigenvectors of the normalized graph Laplacian , where is a diagonal matrix and its diagonal terms are the number of descendants of each node, and is the combinatorial Laplacian. In contrast to the normalized or combinatorial Laplacian matrices [16], our new variation operator encodes the local geometry (distances between points) in as well as the relative importance of a given set of points. The proposed transform is closely related to the Irregularity Aware Graph Fourier Transform (IAGFT) [9].
Multiresolution decompositions for point cloud coding built upon graph filter bank theory have been proposed [2, 17, 3], but they often lack orthogonality, and build multiresolution representations through complex graph partitioning and reduction algorithms, which make them impractical for large point clouds. A Haarlike basis was proposed in [8] for any data that can be represented by a hierarchical tree. This construction is orthogonal and has a constant basis function. Although the other basis functions are spatially localized, they do not exploit local geometry information (distances between points). In addition, there is no efficient algorithm for computing transform coefficients, and matrix vector products need to be used. Also inspired by the Haar transform, [18] proposed subgraph based filter banks, in which a graph for the data is partitioned into connected subgraphs. For each subgraph, a local Laplacian based GFT is computed. Although the RAGFT follows a similar strategy based on nested partitions, the local graph construction, local graph transforms, coefficient arrangement, and design goals are different.
RAGFT can be applied to any type of dataset as long as a nested partition is available. For 3D point clouds, a natural choice is the octree decomposition [10, 12], which can be used to implement RAGFT for point clouds with complexity. This data structure has already been used to design transforms for point cloud attributes [20, 4, 5, 7]. In the block based graph Fourier transform (blockGFT) [20], the voxel space is partitioned into small blocks, a graph is constructed for the points within to each block, and the corresponding graph Fourier (GFT) transform is used to represent attributes in the block. Another popular approach is RAHT [7], where a multiresolution transform is formed by a composition of orthogonal transforms. The blockGFT can achieve excellent performance if the block size is large enough (more points per block), but this has a significant computational cost, since it requires computing GFTs of graphs with possibly hundreds of points. On the other hand, RAHT has an extremely fast implementation, with a competitive coding performance. Our proposed RAGFT combines ideas from blockGFT and RAHT: it generalizes the blockGFT approach to multiple levels, while RAHT can be viewed as a special case of RAGFT that is separable and uses only blocks. We demonstrate through point cloud attribute compression experiments that when RAGFT is implemented with small block transforms, it can outperform RAHT by up to 2.5dB, with a small computational overhead. When the RAGFT is implemented with larger blocks, we outperform the blockGFT with a negligible complexity overhead.
2 Region Adaptive Graph Fourier Transform
2.1 Notation and preliminaries
We use lowercase normal (e.g. ), lowercase bold (e.g. ) and uppercase bold (e.g. ) for scalars, vectors and matrices respectively. Vectors and matrices may also be denoted using their entries as , or . Let denote a weighted undirected graph with vertex set , edge set and edge weight matrix . An edge weight is positive, that is if and only if the . The graph has nodes. Let , with , be the degree matrix and let be the combinatorial graph Laplacian matrix. is symmetric positive semidefinite, and has eigendecomposition
, where the eigenvalues matrix is
, and the eigenvalues are . For connected graphs, . The eigenvector associated to is .2.2 RAGFT block transform
The RegionAdaptive Graph Fourier Transform (RAGFT) is an orthonormal transform formed from the composition of smaller dense block transforms. We start by describing the latter. Let denote a graph as defined in Section 2.1. In addition, define the node weight matrix (), the normalized Laplacian , and its eigendecomposition
(1) 
where is the matrix of eigenvectors of and is the matrix of eigenvalues. Since is symmetric and positive semidefinite, and is orthonormal. Moreover, if we order the eigenvectors by their eigenvalues, and assume a connected graph, we have and the first eigenvector is proportional to . Hence maps the vector to . We define to be the elementary block transform of the RAGFT with inverse .
2.3 Relation to other transforms
Relation to RAHT.
As a special case, consider the twonode graph with , , edge weights , and node weights and . Then , and , hence the normalized Laplacian is
(2)  
(3) 
where , , and . The matrix is the RAHT butterfly [7]. Hence RAHT is the case of the RAGFT.
Relation to IAGFT.
Consider , the fundamental matrix of the IAGFT [9]. Clearly is related to by a similarity transform [9, Remark 1]: . Thus and have the same set of eigenvalues, , and the matrix of eigenvectors of is related to by , that is, It can be shown [9, Thm. 1] that the columns of are orthonormal under the norm, i.e., . The IAGFT is defined as . So , or . Hence the IAGFT is the RAGFT applied to a weighted signal.
2.4 Definition of full RAGFT
We now show how the RAGFT block transforms are composed to form the full RAGFT. Consider a list of points , a realvalued attribute signal and node weights on those points:
Here, is an abstract point, and can be considered a node or vertex of a discrete structure. Now suppose we are given a hierarchical partition or nested refinement of the points, as illustrated in Fig. 1. Let be the depth of the hierarchy, and for each level from the root () to the leaves (), let be the th of nodes at level . At level , we have and . For level , let denote the indices of the children of node and let denote the indices of the descendants of . We are also given for all and , a graph , where is the set of children of node , is a set of edges between the children, is a matrix of edge weights, and is the diagonal matrix of child node weights, where is the sum of the weights of all descendants of child . Let be the RAGFT block transform of the graph .
The full RAGFT is a composition of orthogonal transforms, , with inverse , where
(4) 
is an orthonormal block diagonal matrix and block is an orthonormal matrix for transforming the descendants of the th node at level . Specifically,
(5) 
where is an permutation that collects the lowpass coefficients of the child nodes , for processing by the RAGFT block transform to produce lowpass and highpass coefficients for parent node . When the RAGFT is implemented using all levels of the tree, it produces a single approximation or low pass (DC) coefficient, and detail or high pass (AC) coefficients. Figure 2 depicts the application of the RAGFT for the nested partition depicted in Figure 1. It can be seen inductively that maps the signal to a single lowpass (DC) coefficient equal to , while all other, highpass (AC), coefficients are equal to 0. Thus the first, lowpass (DC) basis function of is proportional to . In the usual case when for all , the first basis function of is constant, as desired. This can be verified using Figure 2. Assume the attributes and weights are all equal to . Since the block transforms at level have sizes , and , the only non zero coefficients at level are , and . Then at level , the weight matrix is , hence the first column of is . Then the only nonzero transform coefficient produced at level is . Therefore the RAGFT of the vector is a sparse vector.
RAGFT  RAGFT  RAGFT  RAGFT  blockGFT  blockGFT 

Complexity estimate of the RAGFT and blockGFT as a function of the block size.
3 Application to Point Clouds
In a point cloud, the vertices represent the coordinates of real points in space; the attributes represent colors or other attributes of the points; and the weights represent the relative importance of the points. The weights are usually set to be constant (), but may be adjusted to reflect different regions of interest [14]. We assume points are voxelized. A voxel is a volumetric unit of the domain of a dimensional signal, analogous to a pixel in the dimensional case. Let be a positive integer, and partition the space into voxels. We say a point cloud is voxelized with depth if all the point coordinates take values in the integer grid . A voxelized point cloud can be organized into a hierarchical structure. The process is described in [7, 13]. The voxel space is hierarchically partitioned into subblocks of size , where is a power of . These block sizes allow for a hierarchy with levels, where . Levels are ordered according to resolution. The partition is constructed by generating a point cloud for each resolution level. That is, beginning with , we have for :
(6) 
where the function removes points with equal coordinates. Each point in has children
(7) 
With the children we form a graph , where there is an edge between nodes if the distance between and is less than a threshold, in which case the weight is set to a decreasing function of that distance. Using this hierarchy and set of graphs, the RAGFT is constructed and applied to the attributes. The point hierarchies described in (6) and (7) can be obtained in time [7, 13]. At resolution we need to construct block transforms. Assuming constant block sizes (), the transform coefficients can be computed in time. Since there are levels, the RAGFT has complexity .
4 Experiments
In this section, we evaluate the RAGFT in compression of color attributes of the “8iVFBv2” point cloud dataset^{2}^{2}2https://jpeg.org/plenodb/pc/8ilabs/ [6], which consists of four sequences: “longdress”, “redandblack”, “loot” and “soldier”, and compare its performance to that of blockGFT [20] and RAHT [7]. Colors are transformed from RGB to YUV spaces, and each of the Y, U, and V components are processed independently by the transform. For all transforms, we perform uniform quantization and entropy code the coefficients using the adaptive runlength GolombRice algorithm [11]. Distortion for the Y component is given by
where is the number of frames in the sequence, is the number of points in the th frame, and and represent the original and decoded signals of frame . Rate is reported in bits per voxel [bpv] where is the number of bits used to encode the YUV components of the th frame.
4.1 Compression of color attributes
Each point cloud in the 8iVFB dataset is represented by an octree with depth . Therefore and its value will depend on the choice of block sizes. We implement several RAGFTs, each with a different block size at the highest resolution (level ), but with the same block sizes for for all other resolutions. When is equal to , , , and , the number of levels is , , and respectively. For the blockGFT we choose block sizes and . Graphs are constructed by adding edges if the distance between a pair of point coordinates is below a fixed threshold, while edge weights are set as the reciprocal of the distance. Distortion rate curves for two sequences are shown in Figure 3.
The RAGFT provides substantial gains over RAHT. When the block size is smallest , the corresponding RAGFT outperforms RAHT up to db for the “longdress” sequence, and up to db for the “loot” sequence. Similar results were obtained for other sequences, not shown due to lack of space. Coding performance improves as the block size increases, up to dB over RAHT on both sequences. The blockGFT also has this property, however, it requries a large block size () to consistently outperform the RAHT for all sequences. This could occur because for smaller blocks, the blockGFT DC coefficients may still be highly correlated. Since the RAGFT with small blocks can be viewed as an extension of the blockGFT to multiple levels, the transform coefficients of the proposed transform are less correlated.
4.2 Complexity analysis
At each level of RAGFT, multiple GFTs of different sizes are constructed, so that the overall computational complexity is dominated by the number and size of those transforms. At resolution level , the th transform is a matrix. This matrix is obtained by eigendecomposition, which requires roughly operations. As a proxy for the number of operations required when applying the RAGFT we use
(8) 
We consider a collection of point clouds. The th point cloud has points, and the quantity (8) for the RAGFT on that point cloud is denoted by . The complexity proxy for the RAGFT with a given tree structured nested partition is defined as . We compute this quantity for the RAGFT and blockGFT depicted in Figure 3. Our results are shown in Table 1, and are computed from the first point clouds of each sequence of the “8iVFB” dataset, for a total of point clouds. For , the increase in complexity from the blockGFT to the RAGFT is only 2.5%; for the increase is negligible. More importantly, for smaller blocks, the complexity of the RAGFT is orders of magnitude lower than of blockGFT.
5 Conclusion
By allowing multiple block sizes, and multiple levels of resolution, the proposed RAGFT can be viewed as an intermediate approach between the blockGFT and the RAHT, reaching coding efficiency comparable to blockGFT, with computational complexity slightly higher than RAHT. By using a nonseparable transform on larger blocks the RAGFT can exploit local geometry more efficiently than the RAHT. On the other hand, by applying transforms with small blocks at multiple resolutions, the RAGFT can approach the performance of the blockGFT with a reduced complexity. For large transform blocks at resolution , the RAGFT performs better than the blockGFT, with a negligible complexity increase. When the transform sizes at resolution level are smaller, we can outperform the RAHT by 2.5db with a comparable complexity.
References
 [1] (1974) Discrete cosine transform. IEEE transactions on Computers 100 (1), pp. 90–93. Cited by: §1.
 [2] (2016) Compression of dynamic 3D point clouds using subdivisional meshes and graph wavelet transforms. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6360–6364. Cited by: §1.
 [3] (2019) A volumetric approach to point cloud compression, part I: attribute compression. IEEE Transactions on Image Processing. Cited by: §1.
 [4] (2016) Attribute compression for sparse point clouds using graph transforms. In 2016 IEEE International Conference on Image Processing (ICIP), pp. 1374–1378. Cited by: §1.
 [5] (2016) Point cloud attribute compression using 3D intra prediction and shapeadaptive transforms. In 2016 Data Compression Conference (DCC), pp. 141–150. Cited by: §1.
 [6] (2017) 8i voxelized full bodiesa voxelized point cloud dataset. ISO/IEC JTC1/SC29 Joint WG11/WG1 (MPEG/JPEG) input document WG11M40059/WG1M74006. Cited by: §4.
 [7] (2016) Compression of 3D point clouds using a regionadaptive hierarchical transform. IEEE Transactions on Image Processing 25 (8), pp. 3947–3956. Cited by: §1, §1, §2.3, §3, §4.

[8]
(2010)
Multiscale wavelets on trees, graphs and high dimensional data: theory and applications to semi supervised learning
. InProceedings of the 27th International Conference on International Conference on Machine Learning
, pp. 367–374. Cited by: §1.  [9] (2018) Irregularityaware graph fourier transforms. IEEE Transactions on Signal Processing 66 (21), pp. 5746–5761. Cited by: §1, §2.3.
 [10] (1980) Octtrees and their use in representing threedimensional objects. Computer Graphics and Image Processing 14 (3), pp. 249–270. Cited by: §1.
 [11] (2006) Adaptive runlength/GolombRice encoding of quantized generalized gaussian sources with unknown statistics. In Data Compression Conference (DCC’06), pp. 23–32. Cited by: §4.
 [12] (1982) Geometric modeling using octree encoding. Computer graphics and image processing 19 (2), pp. 129–147. Cited by: §1.
 [13] (2018) Dynamic polygon clouds: representation and compression for VR/AR. APSIPA Transactions on Signal and Information Processing 7. Cited by: §3.
 [14] (2019) Point cloud coding incorporating region of interest coding. In 2019 IEEE International Conference on Image Processing (ICIP), Cited by: §3.
 [15] (2018) Emerging MPEG standards for point cloud compression. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 9 (1), pp. 133–148. Cited by: §1.
 [16] (201305) The emerging field of signal processing on graphs: Extending highdimensional data analysis to networks and other irregular domains. Signal Processing Magazine, IEEE 30 (3), pp. 83–98. External Links: ISSN 10535888, Document Cited by: §1.
 [17] (2016) Graphbased compression of dynamic 3D point cloud sequences. IEEE Transactions on Image Processing 25 (4), pp. 1765–1778. Cited by: §1.
 [18] (2016) Subgraphbased filterbanks for graph signals. IEEE Transactions on Signal Processing 64 (15), pp. 3827–3840. Cited by: §1.
 [19] (1995) Wavelet and Subband Coding. Englewood Cliffs, NY: PrenticeHall. Cited by: §1.
 [20] (2014) Point cloud attribute compression with graph transform. In 2014 IEEE International Conference on Image Processing (ICIP), pp. 2066–2070. Cited by: §1, §4.
Comments
There are no comments yet.