Wavelet Video Coding Algorithm Based on Energy Weighted Significance Probability Balancing Tree

by   Chuan-Ming Song, et al.
NetEase, Inc

This work presents a 3-D wavelet video coding algorithm. By analyzing the contribution of each biorthogonal wavelet basis to reconstructed signal's energy, we weight each wavelet subband according to its basis energy. Based on distribution of weighted coefficients, we further discuss a 3-D wavelet tree structure named significance probability balancing tree, which places the coefficients with similar probabilities of being significant on the same layer. It is implemented by using hybrid spatial orientation tree and temporal-domain block tree. Subsequently, a novel 3-D wavelet video coding algorithm is proposed based on the energy-weighted significance probability balancing tree. Experimental results illustrate that our algorithm always achieves good reconstruction quality for different classes of video sequences. Compared with asymmetric 3-D orientation tree, the average peak signal-to-noise ratio (PSNR) gain of our algorithm are 1.24dB, 2.54dB and 2.57dB for luminance (Y) and chrominance (U,V) components, respectively. Compared with temporal-spatial orientation tree algorithm, our algorithm gains 0.38dB, 2.92dB and 2.39dB higher PSNR separately for Y, U, and V components. In addition, the proposed algorithm requires lower computation cost than those of the above two algorithms.



page 1

page 2

page 3

page 4


An optimal mode selection algorithm for scalable video coding

Scalable video coding (SVC) is extended from its predecessor advanced vi...

Wavelet Based Image Coding Schemes : A Recent Survey

A variety of new and powerful algorithms have been developed for image c...

Motion-Compensated Temporal Filtering for Critically-Sampled Wavelet-Encoded Images

We propose a novel motion estimation/compensation (ME/MC) method for wav...

Partition mixture of 1D wavelets for multi-dimensional data

Traditional statistical wavelet analysis that carries out modeling and i...

A Novel adaptive optimization of Dual-Tree Complex Wavelet Transform for Medical Image Fusion

In recent years, many research achievements are made in the medical imag...

A Fast Statistical Method for Multilevel Thresholding in Wavelet Domain

An algorithm is proposed for the segmentation of image into multiple lev...

Coverless Video Steganography based on Maximum DC Coefficients

Coverless steganography has been a great interest in recent years, since...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

As the rapid development of Internet, wireless communication, and pervasive computing, many multimedia services has been provided in various applications 01_svc_require , such as video telephony/conferencing, mobile streaming 55_Liu , wireless LAN video, broadband video distribution, professional video manipulation 56_Shah , visual surveillance 50_Wu ; 51_Wu ; 52_Wu , visual retrieval 46_Wang ; 49_Wang , visual recognition 53_Wu , visual analysis 45_Wang ; 47_Wang ; 48_Wang ; 54_Wu , and etc. These applications are able to realize cross-platform and real-time communication for clients with different power, display resolution, and bandwidth. In these scenarios, progressive video transmission and multi-quality services are required due to various user requirements, client capabilities, and transmission conditions (e.g., noise and congestion) over heterogeneous networks. This issues a great challenge to state-of-the-art video coding techniques, and has attracted intensive attentions over the past decade. Scalable video coding (SVC) is one of the effective solutions to this problem 02_overview_h264 , which encodes a video sequence once and decodes it many times in different versions so as to efficiently adapt to the application requirements.

All the existing scalable video coding approaches can be divided into two categories. The first category represented by MPEG-x or H.26x standards is based on closed-loop hybrid prediction and discrete cosine transform (DCT) structure, such as SVC amendment of H.264/MPEG-4 AVC 03_svc_amendment ; svc_uhd ; svc_avc and scalable extension of HEVC svc_extension ; svc_hevc ; svc_hevc_pcm ; svc_hevc_low . The second category employs wavelet-based closed-loop 04_Marpe ; 05_Khan ; 06_zhong or open-loop prediction structure 07_xiong ; 08_chen ; 09_vidwav_report . The multi-resolution property intrinsically enables wavelet transform to implement scalable video coding more easily and flexibly than DCT. In addition, wavelet transform presents superior nonlinear approximation performance to DCT, contributing to coding efficiency improvement. Thus, when MPEG (Moving Picture Expert Group) called for proposals for SVC in 2003, 14 schemes were totally received worldwide, 12 of which addressed the scalable video coding using wavelet-based approach. Moreover, studies show that closed-loop prediction structure is more efficient than open-loop one only when the target bitrate is known, while the latter structure tends to gain superior or approximately equivalent performance to the former in other conditions 11_lopez . Therefore, 3-D wavelet-based scalable video coding (WSVC) exhibits great potentials and has been widely appreciated. So far, many WSVC approaches have been proposed such as 3-D SPIHT 07_xiong , MC-EZBC 08_chen , VidWav platform 09_vidwav_report , and 12_ding ; 13_cheng ; 14_chen ; 15_xiong ; 16_fang ; 17_tao ; 18_chang . And the Joint Video Team (JVT) originated an ad-hoc group on “further exploration on wavelet video coding” in October 2004 to enhance its coding efficiency 19_wvc_overview . Besides all the features, e.g. spatial and temporal scalability, provided by state-of-the-art scalable coding approaches, wavelet video coding will realize more promising functionalities 20_wvc_status such as very high number of spatio-temporal decomposition levels, nondyadic spatial resolution, extremely fine grain SNR scalability, and better rate-distortion performance for very high resolution material. So far, comparison studies have demonstrated that the WSVC provides better coding performance than those of SVC amendment of H.264/MPEG-4 AVC and the Motion JPEG-2000 svc_comparison . And the performance of the SVC amendment and Motion JPEG2000 depends on the resolution of coded video sequences. However, the coding efficiency of state-of-art wavelet video coding is still slightly inferior to that of HEVC/H.265 standard. It is thus much necessary to investigate wavelet-based video coding and further improve its rate-distortion performance.

This study presents a 3-D wavelet video coding algorithm based on energy-weighted significance probability balancing tree. The basic idea is to weight each subband according to its wavelet basis’ energy. Subsequently, the weighted coefficients are encoded using an asymmetric tree that places the coefficients with similar probabilities of being significant on the same layer. Our algorithm enjoys the following advantages not shared by conventional methods.

  • By exploiting the energy distribution of wavelet bases over different subbands, the energy-based weight can facilitate in theory better rate-distortion performance, as well as smaller mean squared error (MSE). Moreover, the subband weight raises the zerotree ratio, decreasing the synchronization information cost.

  • The significance probability balancing tree puts those coefficients with similar probabilities of being significant on the same layer. Hence, the coefficients which has small probability of being significant will be moved toward leaf nodes. Through this way, we are able to obtain as many zerotrees as possible, and to place significant coefficients at the front of bitstream.

  • Taking into account intra-scale relationship between neighboring coefficients, the synchronization bits are encoded block by block instead of the coefficient by coefficient manner of conventional algorithms. Both the synchronization information and computational complexity are thus efficiently reduced.

The reminder of this paper is organized as follows. Section II overviews related works. Section III discusses the biorthogonal wavelet bases’ energy of different subbands, as well as the weight of each subband. Section IV presents a novel 3-D wavelet tree structure based on subband weight. The proposed algorithm is detailed in Section V. We evaluate our algorithm in Section VI and conclude the whole paper in the last section.

2 Related Works

Wavelet-based image and video coding has to address two key issues, namely to encode the magnitude and location of each significant coefficient using as less bits as possible called “synchronization information”. For the first aspect, most algorithms employ successive approximation quantization based on bitplane to encode significant coefficients’ magnitudes, while EBCOT algorithm uses a fractional bitplane technique 21_taubman . These algorithms believe that the bits on the same bitplane have equal importance for the reconstructed image or video quality regardless of whether they are in the same subband or not. In fact, this study will illustrate that the energies of biorthogonal wavelet bases spanning different subbands generally vary. Thus, one bit cannot obtain identical energy with that on the same bitplane but in a different subband. This indicates bitplane coding will not achieve optimal rate-distortion performance. For the second aspect, EZW 23_Shapiro , SPIHT 24_Said , and SLCCA 25_Chai employ zerotree to locate significant coefficients according to magnitude attenuation from coarse to fine scales. In contrast, SPECK 26_Pearlman , EZBC 27_Hsiang , and EBCOT use block, called zeroblock, as a unit to transmit synchronization information exploiting intra-scale correlation of wavelet coefficients.

Moinuddin et al. pointed out that the tree structure plays a significant role in improving 3-D wavelet video coding efficiency 32_Moinuddin ; 33_Fowler . Motivated by the zerotree and zeroblock in image compression, researchers extended symmetrically these two 2-D structures to 3-D cases for video coding 07_xiong ; 08_chen ; 28_Campisi ; 29_Vass ; 30_Kim ; 31_Xu ; 34_Chen ; 35_Khalil ; 36_Minami , e.g. 28_Campisi , 29_Vass , 07_xiong and 30_Kim , 31_Xu , and 08_chen are separately the extensions of EZW, SLCCA, SPIHT, EBCOT, and EZBC. Nevertheless, the wavelet coefficient distribution of still images is obviously different from that of video frames, especially temporal high-pass frames. 37_He

calculates the average standard deviation (STD) of Carphone, Mother & Daughter, and Hall Monitor sequences along horizontal, vertical, and temporal direction. Statistics show that the STD along temporal direction is much smaller than the STDs along the other two directions, while the STDs along horizontal and vertical direction are very close. In this case, the amplitudes of temporal high-pass coefficients tend to be smaller than those of spatial high-pass coefficients. The probabilities of the coefficients on the same layer being significant are nonuniform under such a symmetric tree structure. As a result, part of the synchronization information is wasted, inevitably affecting the overall coding efficiency. In general, instead of intuitively extending zerotree or zeroblock from 2-D to 3-D case, the distribution characteristics of wavelet coefficients should be taken into account when designing tree structure. This will definitely reduce further the overhead of synchronization information and achieve better rate-distortion performance.

Kim et al. use an approximate symmetry tree structure in 07_xiong and obtain superior coding efficiency to symmetry tree structure. However, the approximate symmetry tree requires that the numbers of wavelet decomposition along spatial and temporal directions be equal. This restriction cannot fully exploit the redundancy along temporal direction. To address this issue, an asymmetric 3-D orientation tree is proposed in 37_He which attaches all subbands together to form a longer subband tree. This asymmetric 3-D orientation tree has no limitation on the number of wavelet decomposition along each direction and outperforms 3-D SPIHT. Compared with symmetric tree structure, the asymmetric structure in 37_He is capable of gathering more insignificant coefficients, requiring less synchronization bits. In order to build a longer tree, a virtual zerotree 12_ding ; 38_Khan is proposed as an extension of existing tree structures. It virtually creates zerotrees in subband so that the significant map can be coded in a more efficient way, although no decimation and decomposition actually takes place. However, 12_ding ; 37_He ; 38_Khan is still ignored the nonuniform probabilities of spatio-temporal coefficients being significant on the same layer. Considering this fact, Zhang et al. 39_Zhang decomposes a spatio-temporal orientation tree into a temporal direction tree and a spatial orientation tree. Moreover, the temporal direction trees are encoded with a higher priority over spatial orientation trees. Only when a temporal direction tree has a significant coefficient, the spatial orientation tree which the significant temporal coefficient belongs to will be scanned. This method however may delay the coding of isolated zeros in spatial orientation trees.

We have discussed the inter-scale correlation of wavelet coefficients above. In fact there is another correlation, i.e. intra-scale correlation. For the video sequence following a stable random process, the intra-scale correlation is even stronger than inter-scale correlation 40_Song . On the basis of asymmetric spatio-temporal orientation tree, 32_Moinuddin ; 41_Moinuddin introduce the zeroblock structure of size pixels into 3-D video coding and code the synchronization information using zeroblock as a unit. Since this method takes advantage of correlations of both inter- and intra-scale, its video coding efficiency is effectively improved. But 32_Moinuddin ; 41_Moinuddin treat the coefficients in different subbands indiscriminately. In the next section, we will discuss the unequal roles of different subbands in terms of energy for the reconstructed image/video quality.

3 Energy-Based Weight of Wavelet Coefficients

Most state-of-the-art wavelet-based video coders employ biorthogonal wavelet transform and encode significant coefficients bitplane by bitplane. These approaches believe the bits on the same bitplane in different subbands are of equal importance. However, 42_Usevitch ; 43_Usevitch points out that biorthogonal wavelet transform does not have energy-preserving property. Consequently, the coefficients in separate subbands work differently for the reconstructed signal energy.

Let and be the low-pass and high-pass filter of biorthogonal wavelet, whose length are and , respectively. Their dual filters are separately and . And the low-frequency and high-frequency coefficients under scale are denoted by and , in which and is the length of low-frequency and high-frequency components. Then the -level inverse wavelet transform can be expressed as follows,


where and “*” are up-sampling and convolution operators, respectively. Because wavelet transform is linear, we have . Thus we only need to examine the energy of unit coefficient in each subband to prove that the bases of separate subbands contribute differently to reconstructed image/video.

Theorem 3.1 (Wavelet bases’ energy)

Let ( is a constant, ), while the other coefficients in and all coefficients in be 0. Suppose the reconstructed energy after inverse transform is . Likewise, set ( is a constant, ), while the other coefficients in and all coefficients in to be 0. Let the energy after inverse transform now be . Then we have .

Proof: In the case of , from the procedure of wavelet transform, a nonzero value will produce only if the filter coefficients coincide with . Then we have , where stands for the th filter coefficient of . Similarly, we can obtain in the case of . For all commonly used biorthogonal wavelets in image and video compression, e.g. Daubechies 9/7 and 5/3 wavelet, their sums of squared filter coefficients are not equal. Therefore, we conclude that .

Theorem 1 indicates that we cannot achieve the minimized MSE if we employ traditional bitplane technique to encode all subbands’ coefficients. We thus propose to weight the wavelet coefficients subband by subband before bitplane coding. The weight of each subband equals the square root of its unit coefficient’s energy. Note that the subband weight varies with its corresponding wavelet basis. Table 1 lists the weights of 160 temporal-spatio subbands obtained by a 4-level 5/3 temporal decomposition followed by a 3-level 9/7 spatial decomposition, in which the line and column represent temporal and spatial subbands, respectively. From Table 1 we can find that the lower the subband, the larger the weight. Moreover, there exist obvious differences among the weights of different subbands. This result not only verifies Theorem 1, but also illustrates the necessity of weighting wavelet coefficients.

9.71 7.31 7.31 5.50 5.28 5.28 3.87 4.04 4.04 3.15
4.03 3.04 3.04 2.28 2.19 2.19 1.61 1.68 1.68 1.31
2.98 2.24 2.24 1.69 1.62 1.62 1.19 1.24 1.24 0.97
3.98 2.99 2.99 2.25 2.16 2.16 1.58 1.66 1.66 1.29
2.17 1.63 1.63 1.22 1.18 1.18 0.87 0.90 0.90 0.70
2.33 1.75 1.75 1.32 1.27 1.27 0.93 0.97 0.97 0.75
2.49 1.88 1.88 1.41 1.36 1.36 0.99 1.04 1.04 0.81
2.77 2.09 2.09 1.57 1.51 1.51 1.10 1.15 1.15 0.90
2.06 1.55 1.55 1.17 1.12 1.12 0.82 0.86 0.86 0.67
2.06 1.55 1.55 1.17 1.12 1.12 0.82 0.86 0.86 0.67
2.06 1.55 1.55 1.17 1.12 1.12 0.82 0.86 0.86 0.67
2.06 1.55 1.55 1.17 1.12 1.12 0.82 0.86 0.86 0.67
2.06 1.55 1.55 1.17 1.12 1.12 0.82 0.86 0.86 0.67
2.06 1.55 1.55 1.17 1.12 1.12 0.82 0.86 0.86 0.67
2.13 1.60 1.60 1.20 1.15 1.15 0.85 0.85 0.88 0.67
1.94 1.46 1.46 1.10 1.06 1.06 0.77 0.81 0.81 0.67
Table 1: Weights of Spatio-temporal Subbands After a 4-level Temporal Decomposition Followed by a 3-level Spatial Decomposition

4 3-D Significance Probability Balancing Tree

After weighting each subband, an appropriate tree structure is needed to better locate significant coefficients. Most previous asymmetrical structures pursuit tree depth without considering the nodes’ importance distribution on the same tree level. Table 2 shows average energy comparison between spatial and temporal high-frequency coefficients of the first 16 frames of “Foreman” and “Mobile & Calendar” sequences after 3-D wavelet transform. It can be seen that the average energy of different high-frequency subbands vary apparently, especially and subbands having the largest energy. The average amplitude of spatial high-frequency coefficients is larger than that of temporal high-frequency coefficients. This indicates that the probability of spatial high-frequency coefficients being significant is higher than that of temporal high-frequency coefficients. If both of them are placed on the same level, the latter has to be tested repeatedly before being significant so that the synchronization bits will be wasted. To address this issue, we present a novel 3-D tree structure named significance probability balancing tree in this section.

Temporal Foreman Mobile & Calendar
Spatial Temporal Spatial Temporal
Subbands Subbands’ Subbands’ Subbands’ Subbands’
Energy Energy Energy Energy
39779 6005 119298 5308
1133 2172 2344 2056
319 199 390 62
757 515 1158 247
53 36 56 20
62 36 88 21
76 32 123 26
202 75 239 33
Table 2: Average Energy Comparison Between Spatial and Temporal High-frequency Coefficients of Foreman and Mobile & Calendar Sequences After 3-D Wavelet Transform

Our basic idea is to place those coefficients with similar probabilities of being significant on the same layer based on the amplitude correlation of spatio-temporal coefficients. To construct a significance probability balancing tree, one approach is to study the coefficients’ distribution before each scan, and then to establish an adaptive tree structure. But the high computational demand will inhibit its practical use. Our proposed method processes the descendants of a parent node along its spatial and temporal orientation, respectively. The spatial descendants are arranged using spatial orientation tree of SPIHT, while the temporal descendants are organized from coarse to fine scales along temporal direction, selecting the spatial node with no offsprings as a root. Fig. 1 illustrates the parent-child relationship of the proposed tree. For the sake of clarity, only four temporal frames are shown with a -level temporal decomposition followed by a -level spatial decomposition.

Figure 1: Parent-child relationship of the proposed 3-D significance probability balancing tree

Furthermore, adjacent wavelet coefficients in a subband tend to have identical importance 40_Song . We thus group adjacent coefficients into a block in the temporal high-frequency frames. Each block is treated as an offspring of the coefficient at the corresponding position in temporal low-frequency subband. The structure discussed above is named “temporal-domain block tree”. Only if a block contains the root(s) of nonzero subtree(s), the block will be divided into four coefficients, one of which is the root of a temporal-domain block tree. The other coefficients are separately the roots of three spatial orientation trees. Therefore, the proposed significance probability balancing tree is essentially a hybrid spatial orientation tree and temporal-domain block tree structure.

When using our proposed tree to organize 3-D wavelet coefficients, the synchronization information is coded block by block instead of coefficient by coefficient. Hence, this block-wise manner is able to reduce effectively the number of synchronization bits. In order to verify this point, we decompose the first 16 frames of “Foreman”, “Hall Monitor”, and “Mobile & Calendar” by 3-D wavelet transform and represent the resulting coefficients using our proposed tree and a typical asymmetric tree 37_He . Subsequently, we calculate the ratio of degree-1 zerotree and degree-2 zerotree 44_Cho under the above two tree structures, respectively. As depicted in Table 3, the proposed significance probability balancing tree can achieve higher zerotree ratio compared with the asymmetrical tree 37_He . Fig. 2 presents part of quantized coefficients of temporal and frames of “Foreman”, where “2” , “1”, and “0” separately denote coded coefficients, significant coefficients, and insignificant coefficients. And the quantization step size is 256. Suppose that the coding of sign bits is ignored. Then the asymmetrical tree  37_He needs a total of 40 bits namely “1101010100110001000010011100000000100010” to code significance map, while the significance probability balancing tree requires only 35 bits, i.e. “0110101010011 00010111000000000100 01”. For the other two sequences, a similar results can be obtained. Note that the above three test sequences belong to different classes of videos in MPEG-4 test library. “Foreman” has low spatial detail and medium amount of movement, while “Mobile & Calendar” has high spatial detail and medium amount of movement, and “Hall Monitor” contains low spatial detail and low amount of movement. Consequently, We can conclude that our proposed tree consumes less synchronization bits than typical asymmetric tree in general cases.

Figure 2: Part of quantized coefficients of temporal and frames of “Foreman”
Scan No. Foreman Hall Monitor Mobile & Calendar
Asymmetric Proposed Asymmetric Proposed Asymmetric Proposed
Tree Tree Tree Tree Tree Tree
Scan 100.00 100.00 100.00 100.00 100.00 100.00
Scan 100.00 100.00 99.79 100.00 98.26 100.00
Scan 90.91 98.96 88.96 99.92 70.74 92.63
Scan 77.12 92.72 76.39 97.44 59.15 69.17
Scan 63.62 80.79 70.06 87.52 53.85 61.31
Scan 59.20 67.19 62.81 77.99 49.86 57.23
Scan 55.94 63.35 56.38 71.60 41.40 56.18
Table 3: The Zerotree Ratio Between Our Tree Structure and Asymmetric Tree Structure (%)

5 Implementation of Proposed 3-D Wavelet Video Coding Algorithm

Based on the weighted coefficients and significance probability balancing tree, this section presents a novel 3-D wavelet video coding algorithm.

Similar to SPIHT algorithm, we employ three ordered lists to store the coefficients’ significance information, namely list of insignificant sets (LIS), list of insignificant pixels (LIP), and list of significant pixels (LSP). To facilitate process of the descendants of node , we use and to denote separately the coordinate set of all descendants of and the coordinate set of all descendants except all offsprings of . Furthermore, an element in LIS is either or . To differentiate between them, we name this element “TYPE_A entry” if it represents , while “TYPE_B entry” if it represents . Assuming and denotes quantization threshold of the th scan and the initial threshold respectively, we detail the implementation of our proposed coding algorithm as below.

  1. Parse the input video into a number of GOPs, and then apply 3-D wavelet transform to each GOP.

  2. Initialization.

    1. Calculate the initial threshold for each GOP as follows


      where represents the 3-D wavelet coefficient set.

    2. Set and LSP . Add the coefficients of lowpass subband in the temporally lowest-frequency frame, e.g. the upper-left “a”, “b”, “c” and “d” of the first frame as depicted in Fig. 1, to LIP and LIS, and set them in LIS as TYPE_A entries.

  3. Search for significant coefficients.

    1. Compare each coefficient LIP with . If , output “1” and its sign bit, and then move to LSP. Otherwise, output “0”.

    2. For each untreated LIS, if it is a node of spatial orientation tree, e.g. “b”, “c”, and “d” of the first frame in Fig. 1, go to Step 3.3. Otherwise, if it is a node of temporal-domain block tree, e.g. “a” of the first frame in Fig. 1, go to Step 3.4.

    3. Code using SPIHT, go to Step 3.5.

    4. If is a TYPE_B entry, go to Step 3.4.4. Else, if does not contain significant coefficients, output “0” and go to Step 3.5. Otherwise, output “1”.

      1. Check whether there are significant coefficients in ’s temporal-domain child blocks, e.g. the upper-left “a”, “b”, “c”, and “d” of the second frame depicted in Fig. 1. If only insignificant coefficients are contained, output “0” and move into the LIP. Go to Step 3.4.3.

      2. Otherwise, output “1” and test each node in every child block of . For each significant coefficient, output “1” and sign bit, move it into LSP. For insignificant coefficient, output “0” and move it to LIP.

      3. If , remove from LIS; Otherwise, move to the end of the LIS as an entry of TYPE_B and go to Step 3.5.

      4. If does not contain significant coefficients, output “0”. Otherwise, output “1”, and split current block into one root of temporal-domain block subtree and three roots of spatial directional tree. Move the four roots into LIS as entries of TYPE_A and remove from LIS.

    5. If all LIS have been coded, go to Step 4. Otherwise, go to Step 3.2.

  4. Refine the amplitudes of significant coefficients.

    1. For each entry LSP, if , output “0”. Otherwise, output “1”.

    2. If target rate has been reached, return. Otherwise, Set , .

    3. If , return. Else, go to Step 3.

6 Experimental Results and Analysis

Extensive experiments were conducted on seven color video sequences including the first 128 frames of “Foreman”, “Hall Monitor”, “Mobile & Calendar”, “Coastguard”, “Mother & Daughter”, “Miss America”, and “Bus” in CIF@ 30Hz format. All experiments were performed on VidWav platform 09_vidwav_report . The 3-D wavelet transform was decomposed in “t+2D” manner with -level 5/3 motion-compensated temporal filtering and

-level 2-D 9/7 wavelet. Motion estimation were carried out with quarter-pixel accuracy. For each color video sequence, its Y, U, and V components were sequentially encoded.

To verify the effectiveness of our algorithm, we compare it against two representative methods, i.e. the asymmetric 3-D orientation tree 37_He and temporal-spatial orientation tree 39_Zhang , in terms of peak signal-to-noise ratio (PSNR). Note that, the PSNR statistics of 37_He were obtained on VidWav platform, while the results of temporal-spatial orientation tree were extracted from 39_Zhang which only presented the PSNRs for “Foreman”, “Miss America”, and “Mobile & Calendar”. Table 4 shows the PSNR comparison result among the above three coding algorithms at 128Kbps-1500Kbps for “Miss America”, “Foreman”, and “Mobile & Calendar”. Table 5-Table 8 list the comparison results between the asymmetric 3-D orientation tree and our proposed algorithm for “Hall Monitor”, “Mother & Daughter”, “Coastguard”, and “Bus” at 128Kbps-1500Kbps.

As can be seen from Table 4-Table 8

, asymmetric 3-D orientation tree outperforms temporal-spatial orientation tree for those sequences with low amount of movement, such as Miss America and Foreman, while the latter gains superior efficiency to the former for sequences with high spatial detail and medium amount of movement, such as Mobile & Calendar. This indicates that these two tree structures have distinct merits and their performances always depend on the characteristics of video sequences. Since our algorithm takes into consideration the coefficients’ significance probability distribution on the same tree level, it is less sensitive to video characteristics and achieves the highest PSNR for all test sequences. For Y component, the average PSNR of proposed algorithm is separately 1.24dB and 0.38dB higher than those of

37_He and 39_Zhang . While for U and V components, our algorithm gains 2.54dB and 2.57dB higher PSNR compared with 37_He , and 2.92dB and 2.39dB higher PSNR than 39_Zhang . It is worth mentioning that for the Y component of “Mobile & Calendar”, the PSNR achieved by our algorithm is lower than that of  39_Zhang as shown in Table 4. According to the experimental results presented in 39_Zhang , the PSNR improvement of 39_Zhang is separately 0.18dB, 0.34dB, and 0.41dB at 500Kbps, 1000Kbps, and 1500Kbps compared with 37_He . Nevertheless, our algorithm obtains 1.92dB, 1.26dB, and 1.49dB higher PSNR than those of 37_He at the above bitrates, respectively. In this sense, our algorithm still outperforms 39_Zhang regarding the Y component of “Mobile & Calendar”.

Test Sequence Bitrate PSNR (dB)
(Kbps) Asymmetric Tree Temporal-spatial Tree Proposed Tree
Miss America 128 39.99 38.35 40.41 —– —– —– 40.88 39.10 41.99
256 40.57 38.73 41.01 —– —– —– 42.38 39.92 43.35
384 41.91 39.84 42.14 —– —– —– 43.05 40.45 43.95
500 42.34 40.23 42.54 41.76 40.64 42.27 43.38 40.83 44.17
768 42.78 40.68 43.11 —– —– —– 44.04 41.84 44.68
1000 42.88 40.84 43.23 43.23 40.69 43.46 44.36 42.51 45.00
1500 43.96 42.32 44.42 43.99 40.69 43.46 44.84 43.82 45.52
Foreman 128 29.04 35.45 35.41 —– —– —– 29.80 37.53 37.39
256 32.30 37.87 38.38 —– —– —– 33.32 39.79 40.43
384 33.79 38.70 39.86 —– —– —– 35.07 40.90 42.10
500 35.04 39.52 40.97 34.84 37.63 39.42 36.18 41.71 43.08
768 36.37 40.49 42.20 —– —– —– 37.81 43.12 44.51
1000 37.54 41.26 43.26 37.76 39.63 41.76 38.96 44.01 45.37
1500 38.99 42.72 44.71 39.40 40.50 42.62 40.51 45.42 46.66
Mobile & Calendar 128 20.38 25.99 24.75 —– —– —– 20.76 27.87 26.38
256 23.15 28.28 27.19 —– —– —– 23.99 30.31 29.06
384 24.63 29.21 28.45 —– —– —– 25.71 31.84 30.67
500 25.05 29.52 28.83 27.38 30.92 30.97 26.97 33.54 32.44
768 27.55 31.28 30.92 —– —– —– 28.57 35.19 34.35
1000 28.56 32.28 32.07 31.26 34.38 34.55 29.82 36.82 36.01
1500 30.00 33.70 33.36 33.49 35.77 35.99 31.49 38.47 37.76
Table 4: PSNR Comparison Among Three Test Algorithms at Different Bitrates
Bitrate Asymmetric 3-D Tree Proposed Algorithm
(Kbps) Y U V Y U V
128 31.01 36.47 38.88 31.96 37.68 40.21
256 34.64 37.57 39.72 36.46 39.28 41.46
384 36.71 38.42 40.57 38.10 39.89 42.14
500 37.32 38.72 40.82 38.96 40.32 42.48
768 38.54 39.77 41.57 40.14 41.16 43.03
1000 39.22 40.69 42.08 40.74 41.71 43.43
1500 39.68 41.43 42.52 41.70 42.66 44.16
Table 5: PSNR Comparison Between Asymmetric 3-D Orientation Tree Algorithm and Our Algorithm at Different Bitrates for “Hall Monitor” sequence
Bitrate Asymmetric 3-D Tree Proposed Algorithm
(Kbps) Y U V Y U V
128 35.20 40.52 41.18 36.25 42.90 43.40
256 38.57 42.40 43.54 39.55 44.85 45.73
384 40.27 43.56 44.71 41.19 45.97 46.67
500 41.36 44.45 45.48 42.36 46.64 47.42
768 42.48 45.50 46.20 43.83 47.56 48.42
1000 43.59 46.26 47.00 44.70 48.10 48.98
1500 44.56 47.14 47.91 45.62 48.64 49.52
Table 6: PSNR Comparison Between Asymmetric 3-D Orientation Tree Algorithm and Our Algorithm at Different Bitrates for “Mother & Daughter” sequence
Bitrate Asymmetric 3-D Tree Proposed Algorithm
(Kbps) Y U V Y U V
128 24.96 37.63 36.24 25.58 40.40 41.66
256 26.35 38.40 37.51 27.61 41.63 43.32
384 27.94 40.06 41.06 28.89 42.45 44.14
500 28.67 40.29 41.16 29.80 42.99 44.54
768 29.88 40.89 42.09 31.29 43.64 45.18
1000 31.32 41.38 43.14 32.52 44.22 45.73
1500 33.10 42.10 43.66 34.30 45.02 46.48
Table 7: PSNR Comparison Between Asymmetric 3-D Orientation Tree Algorithm and Our Algorithm at Different Bitrates for “Coastguard” sequence
Bitrate Asymmetric 3-D Tree Proposed Algorithm
(Kbps) Y U V Y U V
128 21.04 31.49 29.87 21.43 34.56 35.67
256 24.42 34.40 34.95 25.14 36.49 37.79
384 25.84 34.99 35.71 27.01 37.36 38.87
500 27.28 35.94 37.13 28.09 37.82 39.45
768 29.13 36.62 38.03 30.37 39.16 40.92
1000 30.10 37.30 38.76 31.48 39.69 41.44
1500 32.41 38.30 39.99 34.03 41.37 43.01
Table 8: PSNR Comparison Between Asymmetric 3-D Orientation Tree Algorithm and Our Algorithm at Different Bitrates for “Bus” sequence

In addition, the ratio of zerotrees is increased after the wavelet coefficients are weighted, as illustrated in Table 3. Thus, the number of isolated zeroes is effectively reduced, which always involve many comparisons and input/output operations in conventional zerotree coding algorithms. Further, since the proposed temporal-domain block tree operates on units of size pixels, the nunber of entries in LIP and LIS is less than that of 37_He and 39_Zhang . On one hand, the energy-based weight and temporal-domain block tree are capable of improving the efficiency of synchronization information. On the other hand, they help to lower the computational complexity of video coding algorithm. Of course, with the increase of target bitrate, the temporal-domain block trees need to be recursively split, the computational complexity will gradually close to that of 37_He ; 39_Zhang .

7 Conclusions

In this study, by analyzing the contribution of each biorthogonal wavelet basis in terms of its reconstructed energy, we proposed to weight each subband by the energy of its corresponding basis before encoding. According to the distribution of weighted coefficients, we put forward a concept of 3-D significance probability balancing tree structure and implement it using hybrid spatial orientation tree and temporal-domain block tree. Consequently, a novel 3-D wavelet video coding algorithm is presented based on energy-weighted significance probability balancing tree. We verify its effectiveness through extensive experiments. We believe that our study will be certainly useful in future researches and developments of wavelet-based scalable video coding.

This work is supported by the National Natural Science Foundation of China (NSFC) under Grant nos. 61402214, 41671439, and 61702246, and the Open Foundation of State Key Laboratory for Novel Software Technology of Nanjing University under Grant no. KFKT2018B07, and the Dalian Foundation for Youth Science and Technology Star (2015R069).


  • (1) SVC requirements specified by MPEG, JVT-N026, Tech. rep., ISO/IEC JTC1/SC29/WG11, Hong Kong (2005).
  • (2) Y. Liu, J. Y. B. Lee, Post-streaming rate analysis a new approach to mobile video streaming with predictable performance, IEEE Transactions on Mobile Computing 16 (12) (2017) 3488–3501.
  • (3) R. Shah, P. J. Narayanan, Interactive video manipulation using object trajectories and scene backgrounds, IEEE Transactions on Circuits and Systems for Video Technology 23 (9) (2013) 1565–1576.
  • (4)

    L. Wu, Y. Wang, J. Gao, X. Li, Deep adaptive feature embedding with local sample distributions for person re-identification, Pattern Recognition 73 (1) (2018) 275–288.

  • (5) L. Wu, Y. Wang, X. Li, J. Gao, What-and-where to match: Deep spatially multiplicative integration networks for person re-identification, Pattern Recognition 76 (1) (2018) 727–738.
  • (6)

    L. Wu, Y. Wang, Z. Ge, Q. Hu, X. Li, Structured deep hashing with convolutional neural networks for fast person re-identification, Computer Vision and Image Undersanding 167 (1) (2018) 63–73.

  • (7) Y. Wang, X. Lin, L. Wu, W. Zhang, Effective multi-query expansions: Collborative deep networks for robust landmark retrieval, IEEE Transactions on Image Processing 26 (3) (2017) 1393–1404.
  • (8)

    Y. Wang, L. Wu, Beyond low-rank representations: Orthogonal clustering basis reconstruction with optimized graph structure for multi-view spectral clustering, Neural Networks 103 (1) (2018) 1–8.

  • (9) L. Wu, Y. Wang, X. Li, J. Gao, Deep attention-based spatially recursive networks for fine-grained visual recognition, IEEE Transactions on Cybernetics 99 (PP). doi:10.1109/TCYB.2018.2813971.
  • (10) Y. Wang, X. Lin, L. Wu, W. Zhang, Q. Zhang, X. Huang, Robust subspace clustering for multi-view data by exploiting correlation consensus, IEEE Transactions on Image Processing 24 (11) (2015) 3939–3949.
  • (11) Y. Wang, W. Zhang, L. Wu, X. Lin, X. Zhao, Unsupervised metric fusion over multiview data by graph random walk-based cross-view diffusion, IEEE Transactions on Neural Networks and Learning Systems 28 (1) (2017) 57–70.
  • (12)

    Y. Wang, W. Zhang, L. Wu, X. Lin, M. Fang, S. Pan, Iterative views agreement: An iterative low-rank based structured optimization method to multi-view spectral clustering, in: Proc. International Joint Conference on Artificial Intelligence, Vol. 1, New York, USA, 2016, pp. 2153–2159.

  • (13) Y. Wang, L. Wu, X. Lin, J. Gao, Multiview spectral clustering via structured low-rank matrix factorization, IEEE Transactions on Neural Networks and Learning Systems 99 (PP). doi:10.1109/TNNLS.2017.2777489.
  • (14) H. Schwarz, D. Marpe, T. Wiegand, Overview of the scalable video coding extension of the H.264/AVC standard, IEEE Transactions on Circuits and Systems for Video Technology 17 (9) (2007) 1103–1120.
  • (15) T. Wiegand, G. J. Sullivan, J. Reichel, H. Schwarz, H. Wien, Joint draft 11 of SVC amendment, Doc. JVT-X201, Tech. rep., Geneva, Switzerland (2007).
  • (16) U.-K. Park, H. Choi, J. W. Kang, J.-G. Kim, Scalable video coding with large block for UHD video, IEEE Transactions on Consumer Electronics 28 (3) (2012) 932–940.
  • (17) A. Bjelopera, S. Grgic, Scalable video coding extension of H.264/AVC, in: Proc. International Symposium ELMAR, Vol. 1, Zadar, Croatia, 2012, pp. 7–12.
  • (18) ISO/IEC JTC 1/SC 29/WG 11 and ITU-T SG 16 WP 3, Joint call for proposals on scalable video coding extensions of high efficiency video coding (HEVC), Tech. Rep. N12957, Stockholm, Sweden (2012).
  • (19) Z. Shi, X. Sun, F. Wu, Spatially scalable video coding for HEVC, IEEE Transactions on Circuits and Systems for Video Technology 22 (12) (2012) 1813–1826.
  • (20) G. Wu, W. Ding, Y. Shi, B. Yin, Adaptive weighted prediction for scalable video coding based on HEVC, in: Proc. Pacific Rim Conference on Multimedia (PCM), Vol. 1, London, 2013, pp. 110–121.
  • (21) S. Lasserre, F. L. Leannec, J. Taquet, E. Nassor, Low-complexity intra coding for scalable extension of HEVC based on content statistics, IEEE Transactions on Circuits and Systems for Video Technology 24 (2014) (to be published).
  • (22) D. Marpe, H. L. Cycon, Very low bit-rate video coding using wavelet-based techniques, IEEE Transactions on Circuits and Systems for Video Technology 9 (1) (1999) 85–94.
  • (23) E. Khan, M. Ghanbari, An efficient and scalable low bit-rate video coding with virtual SPIHT, Signal Processing: Image Communication 19 (3) (2004) 267–283.
  • (24) M. S. Zhong, M. Ghanbari, Motion compensation based on wavelet coefficient blocks, Acta Automatica Sinica 30 (1) (2004) 64–69.
  • (25) B. Kim, Z. Xiong, W. A. Pearlman, Low bit rate, scalable video coding with 3-D set partitioning in hierarchical trees (3-D SPIHT), IEEE Transactions on Circuits and Systems for Video Technology 10 (12) (2000) 1374–1387.
  • (26) P. S. Chen, J. W. Woods, Bidirectional MC-EZBC with lifting implementation, IEEE Transactions on Circuits and Systems for Video Technology 14 (10) (2004) 1183–1194.
  • (27) ISO/IEC JTC1/SC29/WG11, Wavelet codec reference document and software manual, Tech. Rep. ISO/MPEG Video, Tech. Rep. N7334 (Jul. 2005).
  • (28) M. F. Lòpez, V. G. Ruiz, I. García, Efficiency of closed and open-loop scalable wavelet based video coding, Advanced Concepts for Intelligent Vision Systems, Lecture Notes in Computer Science 4678 (10) (2007) 800–809.
  • (29) W. Ding, J. Hu, L. Zhang, Optimal 3D-SPIHT video coding method by reducing redundancy between trees, Journal of Computer-Aided Design & Computer Graphics 17 (3) (2005) 563–569.
  • (30) C. C. Cheng, G. J. Peng, W. L. Hwang, Subband weighting with pixel connectivity for 3-D wavelet coding, IEEE Transactions on Image Processing 18 (1) (2009) 52–62.
  • (31)

    P. Chen, J. W. Woods, Improved MC-EZBC with quarter-pixel motion vectors, Tech. rep., ISO/IEC JTC 1/SC 29/WG 11, Fairfax, VA.

  • (32) R. Xiong, J. Xu, F. Wu, S. Li, Barbell-lifting based 3-D wavelet coding scheme, IEEE Transactions on Circuits and Systems for Video Technology 17 (9) (2007) 1256–1269.
  • (33) S. Fang, Y. Zhong, 3D subband codec with full scalability, Mini-Micro Systems 26 (7) (2005) 1260–1263.
  • (34) T. J, H. Wang, J. Zhang, Z. Jiang, Research on the scalability of 3D wavelet video coding, Mini-Micro Systems 26 (2) (2005) 285–288.
  • (35) Z. Chang, L. Zhuo, L. Shen, An improved motion-compensated three dimension wavelet video coding method, Journal of Circuits and Systems 11 (1) (2006) 113–117, 121.
  • (36) ISO/IEC JTC 1/SC 29/WG 11, Wavelet video coding - an overview, Doc. W7824, Bangkok, Thailand (2006).
  • (37) ISO/IEC JTC 1/SC 29/WG 11, Status report - version 1 on wavelet video coding exploration, Doc. N7822, Bangkok, Thailand (2006).
  • (38) X. Lu, G. R. Martin, Performance comparison of the SVC, WSVC, and Motion JPEG2000 advanced scalable video coding schemes, in: Proc. IET, Intelligent Signal Processing, Vol. 1, London, 2013, pp. 1–6.
  • (39) D. Taubman, High performance scalable image compression with EBCOT, IEEE Transactions on Image Processing 9 (7) (2000) 1158–1170.
  • (40) J. Shapiro, Embedded image coding using zerotree of wavelet coefficients, IEEE Transactions on Signal Processing 41 (12) (1993) 3445–3462.
  • (41) A. Said, W. A. Pearlman, A new, fast, and efficient image codec based on set partitioning in hierarchical trees, IEEE Transactions on Circuits and Systems for Video Technology 6 (3) (1996) 243–250.
  • (42) B. Chai, J. Vass, X. Zhuang, Significance link connected component analysis for wavelet image coding, IEEE Transactions on Image Processing 8 (6) (1999) 774–784.
  • (43) W. A. Pearlman, A. Islam, N. Nagaraj, A. Said, Efficient low-complexity image coding with set-partitioning embedded block coder, IEEE Transactions on Circuits and Systems for Video Technology 14 (11) (2004) 1219–1235.
  • (44) S. T. Hsiang, J. W. Woods, Embedded image coding using zeroblocks of subband/wavelet coefficient and context modeling, in: Proc. IEEE International Symposium on Circuit and Systems (ISCAS’00), Vol. 3, Geneva, Switzerland, 2000, pp. 662–665.
  • (45) A. A. Moinuddin, E. K. E, M. Ghanbari, The impact of tree structures on the performance of zerotree-based wavelet video codecs, Signal Processing: Image Communication 25 (3) (2010) 179–195.
  • (46) J. E. Fowler, B. Pesquet-Popescu, An overview on wavelets in source coding, communications, and networks, EURASIP Journal on Image and Video Processing 2007 (1) (2007) 1–27.
  • (47) P. Campisi, M. Gentile, A. Neri, Three-dimensional wavelet-based approach for a scalable video conference system, in: Proc. IEEE International Conference on Image Processing (ICIP’99), Vol. 3, Kobe, Japan, 1999, pp. 802–806.
  • (48) J. Vass, B. Chai, X. Zhuang, 3-D SLCCA—a highly scalable very low bit-rate software-only wavelet video codec, in: Proc. IEEE Second Workshop Multimedia Signal Processing, Vol. 3, Redondo Beach, CA, 1998, pp. 474–479.
  • (49) B. J. Kim, W. A. Pearlman, An embedded wavelet video coder using three-dimensional set partitioning in hierarchical trees (SPIHT), in: Proc. Data Compression Conference, Vol. 1, Snowbird, USA, 1997, pp. 251–260.
  • (50) J. Xu, Z. Xiong, S. Li, Y.-Q. Zhang, Three-dimensional embedded subband coding with optimal truncation (3-D ESCOT), Applied Computational Harmonic Analysis 10 (3) (2001) 290–315.
  • (51) Y. Chen, W. A. Pearlman, Three-dimensional subband coding of video using the zerotree method, in: Proc. SPIE Visual Communications and Image Processing, Vol. 2727, Snowbird, USA, 1996, pp. 1302–1312.
  • (52) H. Khalil, F. A. F, S. I. Shaheen, Lowering frame-buffering requirements of 3-D wavelet transform coding of interactive video, in: Proc. IEEE International Conference on Image Processing (ICIP’99), Vol. 3, Kobe, Japan, 1999, pp. 852–856.
  • (53) G. Minami, Z. Xiong, A. Wang, P. A. Chou, S. Mehrotra, 3-D wavelet coding of video with arbitrary regions of support, IEEE Transactions on Circuits and Systems for Video Technology 11 (9) (2001) 1063–1068.
  • (54) C. He, J. Dong, Y. Zheng, Z. Gao, Optimal 3-D coefficient tree structure for 3-D wavelet video coding, IEEE Transactions on Circuits and Systems for Video Technology 13 (10) (2003) 961–972.
  • (55) E. Khan, M. Ghanbari, Very low bit rate video coding using virtual spiht, Electronic Letters 37 (1) (2001) 40–41.
  • (56) L. Zhang, D. Wang, A. Vincent, Decoupled 3-D zerotree structure for wavelet-based video coding, IEEE Transactions on Broadcasting 54 (3) (2008) 430–436.
  • (57) C.-M. Song, X.-H. Wang, F. Zhang, Visually lossless accuracy of motion vector in overcomplete wavelet-based scalable video coding, Journal of Computers 4 (9) (2009) 821–828.
  • (58) A. A. Moinuddin, E. Khan, M. Ghanbari, Efficient and embedded 3-D wavelet video coding, in: Proc. TENCON 2008, Vol. 1, Hyderabad, 2008, pp. 1–4.
  • (59) B. Usevitch, Optimal bit allocation for biorthogonal wavelet coding, in: Proc. Data Compression Conference, Vol. 1, Snowbird, USA, 1996, pp. 387–395.
  • (60) B. E. Usevitch, A tutorial on modern lossy wavelet image compression: foundations of JPEG 2000, IEEE Signal Processing Magazine 18 (5) (2001) 22–35.
  • (61) Y. Cho, W. A. Pearlman, Quantifying the coding performance of zerotrees of wavelet coefficients: degree-k zerotree, IEEE Transactions on Signal Processing 55 (6) (2007) 2425–2431.