Low Complexity Trellis-Coded Quantization in Versatile Video Coding

08/26/2020 ∙ by Meng Wang, et al. ∙ City University of Hong Kong ByteDance Inc. Peking University 0

The forthcoming Versatile Video Coding (VVC) standard adopts the trellis-coded quantization, which leverages the delicate trellis graph to map the quantization candidates within one block into the optimal path. Despite the high compression efficiency, the complex trellis search with soft decision quantization may hinder the applications due to high complexity and low throughput capacity. To reduce the complexity, in this paper, we propose a low complexity trellis-coded quantization scheme in a scientifically sound way with theoretical modeling of the rate and distortion. As such, the trellis departure point can be adaptively adjusted, and unnecessarily visited branches are accordingly pruned, leading to the shrink of total trellis stages and simplification of transition branches. Extensive experimental results on the VVC test model show that the proposed scheme is effective in reducing the encoding complexity by 11 configurations, respectively, at the cost of only 0.11 increase. Meanwhile, on average 24 achieved under all intra and random access configurations. Due to the excellent performance, the VVC test model has adopted one implementation of the proposed scheme.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Recent years have witnessed the rapid development of video coding technologies. Newly adopted coding tools afford more mode options to cope with various characteristics in video sequences, leading to significant improvement of coding efficiency. Video coding standards such as H.264/AVC [27], HEVC [21], AVS [9] and VVC [4], specify the semantic of decoding process, bestowing space for encoder optimization and complexity reduction. To coordinate the behaviors of individual or multiple coding tools, rate distortion optimization (RDO) [22] is employed all over the encoding stages. Consequently, the optimal combination of encoding modes and parameters can be systematically determined by RDO, resulting in further promotion of compression performance. Given the maximum allowed rate , the aim of RDO is to minimize the encoding distortions by attempting different combinations of encoding parameters and modes from the set , which can be described as follows,

(1)

Herein, Lagrangian optimization [11] is used to convert the constrained problem in Eqn. (1) to an unconstrained one,

(2)

where is the RD cost and is the Lagrange multiplier. is the number of bits and indicates the distortion. In general, genuine encoding procedures such as transform, quantization, entropy coding, inverse quantization and inverse transform are obbligato to obtain and for an individual mode. Furthermore, numerous RD cost calculations shall be performed with different candidate modes and combinations. This extremely imposes heavy burdens in terms of computational complexity to the encoder, which may impede the implementation and applications of new video coding standards in real application scenarios.

Typically, there are three ways to economize the encoding computational complexity. The first achieves the bottom-level speedup by employing the single-instruction multiple-data (SIMD) [1], which focuses on specific modules that are with data-intensive operations, such as intra prediction, motion compensation, transformation and filtering. As a result, the operation time consumed by each mode attempting can be saved by performing SIMD without any performance variation. The second way alleviates the encoder burden based on pruning the mode attempting, which is able to reduce the number of RDO rounds. More specifically, improbable mode candidates are inferred experimentally or theoretically, and are directly ignored by skipping RD cost calculation or comparisons, leading to the savings of the overall encoding time [31]

. However, the remaining modes shall be evaluated with RDO. The third way focuses on decreasing the complexity of RD calculation wherein the rate and distortion are estimated, instead of exhaustively going through the tedious working flow 

[32, 15].

The quantization, which is measured in terms of the goodness of the reproduced signal compared to the original as well as the resulting representation cost, evolves rapidly in video coding standards. Soft decision quantization (SDQ) [26, 30, 13] introduces the sense of rate-distortion optimization to quantization level determination, which promotes the coding efficiency and simultaneously raises the computational complexity compared to the conventional hard decision quantization (HDQ) [23]. During the standardization of H.264/AVC [27], HEVC [21] and AVS2 [9], a classical SDQ method, rate distortion optimized quantization (RDOQ) [13], was adopted and desirable coding performance had been achieved. However, the computational complexity of RDOQ becomes the barrier since the entropy coding shall be performed for each candidate along with context model updating. In VVC, besides RDOQ, trellis-coded quantization (TCQ) [20]

is adopted, which is also termed as dependent quantization. With TCQ, quantization candidates are delicately deployed into trellis graph at block level cooperating with state transfer, in an effort to convert the optimized quantization solution into the optimum path searching task. As such, the statistical dependencies among quantizaion outcomes within one coded block can be exploited. Different from the HDQ or RDOQ, where the former only conducts the quantization without considering the influence of coding bits, and the latter pays attention to the optimal RD behavior regarding the up-to-now coefficient, TCQ maps the coefficients to the trellis graph by employing vector quantizer, and seeks the path with the minimum RD cost as the optimal quantization solution. Superior coding performance is achieved by TCQ when compared to the RDOQ, where 3.5% and 2.4% bit-rate savings are reported under all intra (AI) and random access (RA) configurations, respectively, in the VTM-1.0 platform

[19]. However, significant encoding complexity increase has also been observed, which attributes to the RD calculation, accumulation and comparison for each stage and each node during TCQ.

There have been many researches study on reducing the complexity of the sophisticated quantization process. Huang and Chen [10] presented an analytical method to address the RDO-based quantization problem for H.264/AVC. In [15], the variation of rate and distortion, and models were investigated, where improbable quantization candidates in RDOQ can be efficiently excluded. In [8], transform coefficients are modeled with Laplacian distribution, which can be further utilized to deduce the block-level RD performance for RDOQ. A trellis-coded quantization method was studied in [30] for H.264/AVC, where all potential candidates along with coding contexts are mapped into the trellis graph, leading to the improvement of coding efficiency. However, the computational complexity is extremely high regarding the optimal path searching for both software and hardware implementations. To tackle this problem, Yin et al. [32] proposed a fast soft decision quantization algorithm that discriminated safe or unsafe quantization levels based on the speculation of the variations regarding rate and distortion.

Though the previous research works are effective for lessening the quantization complexities for H.264/AVC and HEVC, they are not applicable to the trellis-based quantization in VVC. Herein, we propose a low complexity TCQ scheme for VVC by modeling rate and distortion in a scientifically sound way. With the proposed model, the RD performance with regard to different quantization candidates can be effectively evaluated, and the computational intensive searching process in TCQ can be safely eliminated. In particular, the trellis departure point is adaptively determined, by which the total number of trellis stages can be shrunk. Moreover, the branch pruning scheme is investigated based on the RD models, which is conductive to decrease the operation complexity of TCQ.

Ii Statistical Rate and Distortion Models

In the literature, the rate and distortion models have been statistically established according to the probability distribution of transform coefficients 

[18, 33, 14, 12, 35, 16]. The distribution of transform coefficients has been studied for several decades, including Laplacian distribution [14] [16], Cauthy distribution [12]

, generalized Gaussian distribution 

[35] and combined distribution [29]

. Generalized Gaussian distribution reveals the best modeling accuracy owing to the flexible controlling parameters associated to shape and scale. However, the controlling parameters are difficult to estimate, which significantly hinders its applications. In addition, Cauthy distribution may not be appropriate for the RD modeling task in the encoder since the mean and variance are not converged. By contrast, Laplacian distribution was regarded as the optimal solution for compromising modeling complexity and accuracy 

[16]. In the literature, numerous rate and distortion models have been proposed, and they can be further applied to rate control, bit allocation and fast mode selection. A block level rate estimation scheme was presented in [35] to speed up the RDO selection for H.264/AVC, where individual sub-band of transform coefficients is modeled with generalized Gaussian distributions. In [12], frame-level bits are approximated and allocated by modeling the AC coefficients with Cauthy distribution. Moreover, the rate model was established from -domain [34] based on the assumption of Gaussian and Laplacian distribution, where a linear relationship between rate and the percentage of non-zero coefficients was delicately derived. In [5, 28, 10, 6], the rate was also modeled with the -norm of quantization coefficients.

In this section, we develop the rate and distortion models dedicated to the sophisticated designed quantization in VVC with Laplacian distribution. In particular, let be the transform coefficient locating at position in a coding block with size , and the scalar quantization is given by,

(3)

where is the corresponding scalar quantized coefficient. The parameter is typically involved to control the rounding offset, which is set to during the pre-quantization process in VVC and HEVC soft quantization. represents the quantization step size.

Laplacian distribution is adopted here to model the transform residuals. In particular, the probability density function (PDF) of transform coefficient

is given by,

(4)

where

is the Laplacian parameter that can be determined with the standard deviation

as follows,

(5)

Ii-a Relationship between Rate and -norm of Coefficients

In general, given a certain allowed distortion level , based on Shannon’s source coding theorem, the minimum bits for coding a symbol can be derived as,

(6)

As such, for a preset , the associated distortion is given by,

(7)

where

(8)
(9)

Herein, represents the quantization level. As such, and can be derived as follows,

(10)
(11)

where . Therefore, can be represented as,

(12)

Given the PDF of the transform coefficients, the percentage of non-zero quantized coefficients can be estimated as follows,

(13)

Moreover, can also be represented with -norm,

(14)

where should be within the range of [0,1] and denotes the -norm in a coding block. Furthermore, the relationship between the coding bit and the percentage of non-zero quantized coefficients can be obtained by substituting Eqn. (12) and Eqn. (13) into Eqn. (6) as follows,

(15)

The Taylor expansion of Eqn. (15) can be expressed as,

(16)

As such, we could have the following relationship,

(17)

The relationship is approximated to be locally linear with respect to , which corresponds to the -domain model [34].

Ii-B Relationship between Rate and -norm of Coefficients

Herein, we further model the rate from the perspective of self-information [7]. In particular, the self-information of a quantized symbol is given by,

(18)

where

denotes the probability of the scalar quantization result

equaling to . Given the quantization step and the rounding offset , is represented as follows,

(19)

By integrating the Laplacian distribution into Eqn. (19), the probability of the quantized symbol can be expressed as,

(20)

With Eqn. (4), Eqn. (18) and Eqn. (20), the rate of the quantized symbol can be approximated. For the case of , can be estimated as,

(21)

where

(22)

For the case of , can be approximated as,

(23)

where

(24)

As such, the total coding bits of a coding block can be expressed as,

(25)

where represents the combination of and . In this regard, the number of coding bits of a block is determined by the -norm of the coefficients.

Fig. 1: Illustration of Taylor expansion, first order, second order and third order approximation with respect to .
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)
Fig. 2: Illustration of the actual coding bits , -norm, -norm and estimated coding bits for the sequence “RaceHorses”. (a-d) versus -norm of quantized transform coefficients ; (e-h) versus -norm of quantized transform coefficients ; (i-l) versus estimated coding bits .

Ii-C Rate modeling

The -norm and -norm of coefficients play complementary roles in approaching the number of coding bits, and merely employing -norm or -norm may lead to the biased approximation. First, the low bit rate assumption made by Eqn. (16) may not always hold. As illustrated in Fig. 1, when is beyond 50%, corresponding to high bit rate coding, the actual bits could be underestimated when adopting the -norm only. In addition, it can be noticed that -norm estimates the rate in a statistical manner without considering the individual level of coefficients. Obviously, larger coefficients should consume more coding bits, which cannot be well characterized by -norm only. This also provides useful evidences regarding the incorporation of -norm. On the other hand, -norm only pays attention to the individual coding elements but ignores the dependencies and context in coefficients coding process. Therefore, it is possible to equip the rate model with both -norm and -norm. Moreover, the position information, especially the location of the last non-zero coefficients also influences the final coding bits. As such, the final rate model is given by,

(26)

where , , and denote the model parameters. In particular, the parameters and which control the relationship between rate and /-norms of the quantized coefficients, highly rely on the and block sizes. and represent the -norm and -norm of the current CU, respectively. In VVC, individual coordinate of the last significant coefficient is composed of a prefix and suffix, wherein the prefix is context coded with truncated unary bins and the suffix is bypass coded with fixed length bins. Here, the coding bit of the coordinates regarding the last non-zero coefficient is represented by , which is practically obtained by a look-up table. The relationship between the actual coding bits and the estimated coding bits is illustrated in Fig. 2 with various QPs, showing that the model delivers high accuracy in modeling the rate in VVC.

Ii-D Distortion modeling

To measure the quantization distortion, sum of square error (SSE) is adopted, and the SSE of a coding block can be straightforwardly represented as follows,

(27)

where indicates the inverse quantization.

Iii Problem Formulation

Scalar quantization has been widely used in video coding standards owing to its computational simplicity, as it generally employs one quantizer associated with a specific quantization step. TCQ was studied early in 1990 [17], which can be regarded as large-dimension vector quantization with constrained vector components and is capable of remedying the inevitable performance loss incurred by scalar quantization to some extent. The TCQ in VVC is implemented as dependent scalar quantization that simultaneously maintains two quantizers , with four transition states. To be more specific, TCQ embeds quantization candidates in one block into trellis graph wherein the best quantization outcomes correspond to the path with the minimum RD cost. In this manner, the inter-dependencies of transform coefficients can be well exploited, and moreover, TCQ persuades the reconstruction vector to be more compact with augmented quantizers and candidates. As such, significantly better RD performance can be achieved.

More specifically, in TCQ of VVC, given a transform coefficient , several quantization candidates can be obtained based on the pre-quantization results. Subsequently, the quantization level is further converted into the quantization index . Since the representation of quantization index is nearly half of the original quantization level, coding bits could be naturally saved. The parity of the current quantization index, as well as the current state, determine the state transition route and the quantizer for next coefficient, as illustrated in Fig. 3. Consequently, the reconstruction process of is always associated to even times of quantization step , and

is bounded with odd times of

. The reconstruction process is illustrated in Algorithm 1, where denotes the number of quantization indices corresponding to the trellis stages within one coding block and represents the processing order.

The quantization distortion and rate of individual coefficient are calculated and recorded for each trellis node following the scanning order. Given the current quantization index and transition state , the quantization level can be reconstructed as follows,

(28)

where typically specifies the utilized quantizer and in turn introduces even or odd multiples of quantization steps. As such, the distortion is given by [20],

(29)

The absolute of quantization index is entropy coded by signaling the syntax , , and with regular mode. Also, the remaining levels denoted by

are binarized with Golomb-Rice code and coded in bypass mode. Despite of the aforementioned four states that are involved in the state transition loop, a special state termed as “uncoded” state is introduced, which attempts to truncate the residuals locating in the high frequency domain, in an effort to further save the coding bits. An exemplified trellis graph of one coding block is illustrated in Fig. 

4. The switching from “uncoded” state to State 0 or State 2 is only allowed when encountering non-zero quantization indices.

Following the scanning order, the cost of each individual stage is accumulated along the transition path until attaining the end of the block. Typically, there are multiple enter-paths with different accumulative costs attaining to the same node. Only one path with the lowest cost is retained as the survivor path. Considering the reverse scanning order, the cost accumulating can be described as follows,

(30)

In particular, if the state transition is from one “uncoded” state to another “uncoded” state, the RD cost is iterated as follows,

(31)

By contrast, if the state switches from “uncoded” state to State 0 or State 2, can be calculated as follows,

(32)

where denotes the bits used for representing the variation of the coded block flag (, from 0 to 1). denotes the bits regarding to the position of the last (first traversed) non-zero coefficient, the coordinators of which are represented with and .

Such cost accumulation, path comparison and the optimal branch selection process can be regarded as the add-compare-select (ACS) [32]. Moreover, RD cost calculation is conducted in branch metric unit (BMU). Essentially, the goal of TCQ is to find the optimal quantization solution that can achieve the minimum RD cost for the whole coding block. Viterbi algorithm is used for the optimal path searching. Multiple routes are available at each stage, where each route represents the state transition invited by the quantization index and individual transform coefficient can be regarded as the stage of trellis graph. After attempting all the paths linked with one node, only one path with the lowest accumulated RD cost that connects to the destination node will be retained.

The complexity of TCQ is mainly attributed to three factors. The first one is the total number of trellis stages , which generally corresponds to the number of coefficients within one block. The second one is the number of branches linked to a node, which attributes to the quantity of quantization candidates. The third one is the number of states at each stage . Since the quantization indices are grouped based on the parity, the number of branches linked to an individual node is halved compared to the full connected trellis, and candidate “0” should be additionally counted. Supposing quantization candidates are available for a single transform coefficient, the branch complexity can be described as,

(33)

For a typical coding block with coefficients, the total branch count of TCQ can be formulated as,

(34)

Consequently, the complexity is proportional to the stage number and branch number . We demonstrate the theoretical computational complexity in Table I. To better handle those three problems while maintaining the efficiency of TCQ, we apply the established rate and distortion models to achieve low complexity TCQ for VVC.

Module Branch BMU ACS
Distortion Rate Add Compare Select
TCQ
TABLE I: Complexity analyses of TCQ
1:, ;
2:Dequantization results:
3:Initialize state: ;
4:for ; ;  do
5:     ;
6:     
7:end for
Algorithm 1 Reconstruction of transform coefficients with trellis coded quantization [4]
Fig. 3: Illustration of the state transition [4].
Fig. 4: Illustration of trellis graph [20].
(a)
(b)
(c)
(d)
Fig. 5: Illustration of the position distribution of the last non-zero coefficient index in 16 16 coding blocks under AI configuration of VVC. Both HDQ and TCQ are considered, where HDQ corresponds to orange bars and TCQ is represented with blue bars.

Iv Low Complexity Trellis-Coded Quantization

Iv-a Determination of the Trellis Departure Point

We first propose to softly decide the trellis departure point based on the rate and distortion models, with the goal of shrinking the total number of trellis stages . Quantization and residual coefficient coding are conducted based on coefficient group (CG) with reverse scanning order. As such, coefficients from the bottom right to the left-top within one coding block are orderly mapped into trellis along with accessible states. The starting point, which serves as the first non-zero point during traversing of the trellis graph, plays critical roles in TCQ. It is widely acknowledged that the coefficients locating within the high frequency regions tend to be quantized to zeros in a soft way in the sense of rate-distortion optimization. To manifest this, statistical experiments are conducted to exploit the distribution of the last non-zero coefficient position with the HDQ and TCQ in coding blocks, as illustrated in Fig. 5. It can be observed that the positions of the last non-zero coefficient are concentrated on smaller scan indices with TCQ compared to HDQ. With the increase of QP, such distribution differences become more apparent.

As such, by combining the rate-distortion models and TCQ, we propose an algorithm that allows us to accurately determine the trellis departure point in an elegant and low cost way. More specifically, we cast this problem into the comparisons of the RD cost, as the RD cost differences associated with two contiguous non-zero quantized coefficients can be measured and compared to determine the optimal trellis starting point. With the proposed algorithm, the initial point of the trellis graph can be postponed, leading to the shrinkage of the total stages as well as computational complexity in determining the optimal quantized coefficients of the given coding block.

Supposing and are two typical positions which satisfy the following constraints,

(35)

Here, we use and to represent the absolute value of and , respectively. Since coefficient is ahead of during traversing, the quantization result should be zero if position serves as the initial point of trellis.

The rate estimation model in Eqn. (26) is employed where the total rates of a block can be represented with and when position and serve as the trellis initial point, respectively. As such, the rate difference is formulated as follows,

(36)

where denotes the quantization index of .

Regarding the distortion, we further formulate with and , where denotes the overall distortion when the coefficient at position is regarded as the trellis starting point. The derivations of and are given by,

(37)
(38)

As such, can be expressed as,

(39)

The RD cost difference that characterizes the variation of RD cost by removing out of the trellis can be derived as follows,

(40)

The non-zero coefficient at position can be eliminated from the trellis according to . In other words, if , it is unnecessary to involve the coefficient at position in the trellis.

As such, by combining Eqn. (36), Eqn. (39) and Eqn. (40), an RD-based threshold with respect to can be derived as follows,

(41)

where

(42)

Assuming that the rate for representing the last position is larger than or equal to that of position , by approximating with , with , and substituting with , can be simplified as follows,

(43)

where is a multiplication factor that can be obtained according to the VVC configuration. The theoretical minimum value of can be obtained as follows,

(44)

where

(45)

Herein, the minimum value of is adopted as the threshold. If the condition of Eqn. (41) is satisfied, the associated quantization coefficient can be directly determined as zero and further removed from the trellis graph. In this way, the trellis starting point can be postponed to the next non-zero coefficient. Otherwise, is regarded as the starting point of the trellis graph.

Fig. 6: Illustration of the trellis pruning for the larger quantization candidates.
Fig. 7: Illustration of the trellis pruning for the smaller quantization candidates.
Class Sequence AI RA
BD-Rate(Y) BD-Rate(U) BD-Rate(V) BD-Rate(Y) BD-Rate(U) BD-Rate(V)
A1 Tango2 0.07% 0.23% 0.32% 34% 13% 0.01% 0.20% 0.26% 32% 3%
FoodMarket4 0.09% 0.22% 0.16% 26% 10% 0.04% -0.14% -0.08% 19% 0%
Campfire 0.05% 0.09% 0.02% 24% 10% 0.01% 0.14% 0.10% 23% 5%
A2 CatRobot1 0.11% 0.18% 0.14% 28% 11% -0.02% 0.01% 0.13% 32% 2%
DaylightRoad2 0.16% 0.41% 0.20% 26% 11% 0.07% -0.14% -0.09% 33% 1%
ParkRunning3 0.03% 0.13% 0.17% 17% 7% 0.01% 0.12% 0.16% 21% 1%
B MarketPlace 0.03% 0.15% 0.38% 22% 10% 0.04% 0.23% 0.41% 26% 5%
RitualDance 0.06% 0.28% 0.15% 21% 9% 0.04% -0.22% -0.18% 22% 5%
Cactus 0.07% 0.09% 0.21% 22% 10% 0.03% 0.10% 0.29% 27% 6%
BasketballDrive 0.10% 0.10% 0.16% 27% 12% 0.05% -0.46% -0.01% 27% 5%
BQTerrace 0.06% 0.10% 0.23% 17% 8% 0.07% 0.79% 0.16% 23% 4%
C BasketballDrill 0.14% 0.05% 0.14% 23% 11% 0.05% -0.05% 0.07% 27% 7%
BQMall 0.08% 0.10% 0.15% 20% 10% 0.05% -0.35% -0.25% 23% 5%
PartyScene 0.09% -0.03% 0.15% 13% 6% 0.08% 0.10% 0.30% 19% 6%
RaceHorses 0.08% 0.05% 0.00% 17% 8% 0.03% 0.19% 0.21% 21% 5%
D BasketballPass 0.07% 0.09% 0.26% 18% 8% 0.08% -0.30% -0.07% 19% 4%
BQSquare 0.10% -0.02% -0.20% 12% 5% 0.00% -0.53% -1.09% 17% 3%
BlowingBubbles 0.11% -0.22% 0.37% 13% 7% 0.17% 0.69% 0.41% 20% 4%
RaceHorses 0.14% -0.25% -0.67% 15% 6% -0.05% -0.28% 0.55% 19% 3%
E FourPeople 0.11% 0.17% 0.20% 19% 9% - - - - -
Johnny 0.09% 0.40% 0.11% 21% 9% - - - - -
KristenAndSara 0.08% 0.05% 0.13% 20% 8% - - - - -
F BasketballDrillText 0.11% 0.07% 0.07% 19% 8% 0.05% 0.09% 0.10% 25% 6%
ArenaOfValor 0.11% 0.06% 0.18% 17% 4% 0.10% 0.09% 0.18% 22% 6%
SlideEditing 0.01% 0.09% 0.21% 7% 4% -0.03% 0.07% -0.02% 10% 2%
SlideShow 0.09% 0.50% 0.35% 12% 4% 0.18% -0.19% -0.12% 12% 2%
Class A1 0.07% 0.18% 0.16% 28% 11% 0.02% 0.06% 0.09% 25% 3%
Class A2 0.10% 0.24% 0.17% 23% 10% 0.02% 0.00% 0.07% 29% 1%
Class B 0.07% 0.14% 0.23% 22% 10% 0.04% 0.09% 0.13% 25% 5%
Class C 0.10% 0.04% 0.11% 18% 9% 0.05% -0.03% 0.08% 23% 5%
Class E 0.09% 0.21% 0.15% 20% 9% - - - - -
Overall 0.09% 0.15% 0.17% 22% 10% 0.03% 0.03% 0.08% 25% 4%
Class D 0.10% -0.10% -0.06% 14% 7% 0.05% -0.11% -0.05% 19% 3%
Class F 0.08% 0.18% 0.20% 14% 5% 0.07% 0.02% 0.04% 17% 4%
TABLE II: Performance of the proposed scheme by postponing the trellis departure point under AI and RA configurations
Class Sequence AI RA
BD-Rate(Y) BD-Rate(U) BD-Rate(V) BD-Rate(Y) BD-Rate(U) BD-Rate(V)
A1 Tango2 0.04% 0.03% 0.06% 3% 1% 0.01% -0.05% 0.27% 5% 4%
FoodMarket4 0.06% 0.03% 0.11% 2% 1% 0.00% -0.20% 0.09% 3% 4%
Campfire 0.01% 0.05% 0.01% 3% 1% 0.00% 0.05% -0.09% 4% 1%
A2 CatRobot1 0.04% 0.13% -0.06% 4% 1% -0.01% -0.12% 0.00% 4% 3%
DaylightRoad2 0.02% 0.21% 0.08% 4% 2% 0.01% 0.01% -0.04% 7% 3%
ParkRunning3 0.05% 0.05% 0.02% 3% 1% 0.01% 0.04% 0.02% 3% 7%
B MarketPlace 0.03% 0.07% 0.19% 2% 1% -0.01% 0.00% -0.22% 2% 2%
RitualDance 0.02% 0.30% -0.01% 3% 2% 0.01% 0.03% 0.00% 2% 2%
Cactus 0.03% -0.01% 0.11% 3% 1% 0.03% -0.23% -0.03% 3% 0%
BasketballDrive 0.04% 0.09% -0.07% 4% 1% 0.01% -0.48% -0.15% 3% 0%
BQTerrace 0.04% -0.15% 0.12% 2% 1% 0.05% 0.57% -0.53% 3% 0%
C BasketballDrill 0.03% -0.06% 0.09% 0% 0% 0.04% 0.07% -0.17% 2% 0%
BQMall 0.01% -0.04% -0.19% 3% 0% 0.03% -0.13% -0.26% 1% -1%
PartyScene 0.03% -0.01% -0.03% 0% 2% 0.02% 0.15% -0.28% 0% -1%
RaceHorses 0.06% -0.09% 0.02% 2% 0% 0.02% -0.21% 0.00% -1% -4%
D BasketballPass 0.03% 0.04% 0.10% 1% 1% -0.06% -1.01% 0.02% 0% 0%
BQSquare 0.06% -0.21% -0.10% 1% 0% -0.05% -0.87% -1.24% 0% 0%
BlowingBubbles 0.06% -0.11% -0.17% 0% 1% 0.09% 0.18% 0.13% 0% 0%
RaceHorses 0.04% 0.02% -0.44% 1% 1% -0.02% -0.79% -0.06% -1% 0%
E FourPeople 0.01% -0.02% 0.08% 3% 1% - - - - -
Johnny 0.03% 0.25% -0.11% 2% 0% - - - - -
KristenAndSara 0.01% -0.06% -0.01% 4% 1% - - - - -
F BasketballDrillText 0.04% -0.01% -0.01% 2% 1% 0.03% -0.14% 0.18% 2% -1%
ArenaOfValor 0.02% -0.02% 0.06% 2% 1% -0.01% 0.02% -0.06% -2% 0%
SlideEditing 0.04% 0.04% 0.10% 1% 1% -0.07% -0.01% -0.11% 2% 1%
SlideShow 0.11% -0.07% -0.22% 1% 0% 0.00% -0.37% -0.39% 2% 0%
Class A1 0.03% 0.04% 0.06% 3% 1% 0.01% -0.07% 0.09% 4% 3%
Class A2 0.03% 0.13% 0.01% 3% 2% 0.00% -0.02% -0.01% 5% 4%
Class B 0.03% 0.06% 0.07% 3% 1% 0.02% -0.02% -0.18% 3% 1%
Class C 0.03% -0.05% -0.03% 1% 1% 0.03% -0.03% -0.18% 0% -2%
Class E 0.01% 0.06% -0.01% 3% 1% - - - - -
Overall 0.03% 0.04% 0.02% 3% 1% 0.01% -0.03% -0.09% 3% 1%
Class D 0.05% -0.06% -0.15% 1% 1% -0.01% -0.62% -0.29% 0% 0%
Class F 0.05% -0.02% -0.02% 1% 1% -0.01% -0.13% -0.10% 1% 0%
TABLE III: Performance of the proposed trellis pruning method under AI and RA configurations

Iv-B Trellis Pruning

In TCQ, RD cost calculations and examinations are performed with trellis branches in an effort to detect the optimal quantization results, which introduces high computational cost to the encoder. We propose to perform trellis pruning targeting at eliminating the unlikely quantization candidates and removing the associated transition routes. In this way, the operation complexity in BMU and ACS modules can be lowered and the total branch number can be decreased. More specifically, the trellis pruning is carried out based on the analyses of RD cost relationships with the proposed RD model. The quantization candidate set is composed with five candidates in principle, which can be regarded as adjusting the level of with different offsets . Again, is the absolute value of the scalar quantization at position , and the possible values of are as follows,

(46)

We use to represent the explicit quantization candidate associated to offset as follows,

(47)

Supposing is the absolute value of the transform coefficient at position , the total distortions of a coding block when quantizing to can be described as,

(48)

Analogously, if the quantization result of is , the distortions can be expressed as follows,

(49)

The distortion difference between and with respective to can be formulated as,

(50)

According to Eqn. (50), reaches the maximum value when equals to 0. Therefore, it can be noticed that always provides the lowest quantization distortion.

Meanwhile, the rate differences can be estimated with our proposed model in Eqn. (26) when is changed to as follows,

(51)

where is the difference of the coded index when the quantization level is adjusted by , and can be calculated as follows,

(52)

Herein, and represent the coded indices of quantization candidates and , respectively. Typically, denotes the variations regarding the number of non-zero coefficients, which can be determined as follows,

(53)

Subsequently, we discuss the variations of the rate and distortion with the following cases.

Iv-B1

in this case, and are both non-negative, such that equals to 1. Since the parameters and are positive, Eqn. (51) can be written as follows,

(54)

Both the distortion and rate may increase if is adjusted to in such scenario, which implies that the remaining quantization candidates could possibly introduce higher RD cost, leading to the coding performance loss. As such, it is eligible to directly remove the quantization candidates that are with higher levels without further calculation of the RD cost.

Iv-B2 or

the corresponding quantization index is “1”. The explicit value of depends on , which is formulated as follows,

(55)

It can be observed that when equals to -1, is a positive constant, indicating the savings of coding bits. However, it is difficult to intuitively predict the final variations of RD cost, since the associated distortion is also increased. Moreover, it can be inferred that positive leads to the increase of the coding bits. Therefore, larger quantization candidates are considered to be removed from the trellis graph in such scenarios.

Iv-B3

Though negative results in the saving of the coding bits, it is still difficult to speculate the actual variation of RD cost,

(56)

For the case that equals to , remarkable increase of distortions could be noticed, which cannot be well remedied by the saving of coding bits. As such, it is proposed to eliminate the checking of candidate level 0.

We provide two exemplified quantization candidate sets to better illustrate the technical details of pruning. The first set conforming to the former two cases, where the numbers inside and outside of the square brackets denote the quantization indices and quantization levels, respectively. The proposed pruning procedure is demonstrated in Fig. 6. Initially, following the transition rule of TCQ, candidates “” and “” are coupled and bounded to the quantizer , where State 0 and State 1 are assigned as transmitting states. Similarly, “” and “” are paired with quantizer associating to State 2 and State 3. The candidate “” is specifically bounded with each state. Accumulated RD cost will be calculated for each node with the consideration of all available branches, and the one with minimal cost will be retained in the trellis graph. With the proposed method, the unlikely-selected branches with larger quantization levels such as “” and “” are directly pruned without RD checking, leading to lower computational complexity. The second set corresponds to the last case where transition routes of “” can be pruned, as illustrated in Fig. 7. Since the proposed pruning method does not have any influence on the state transition, the dequantization process remains consistent. In practice, we map the relationship of and , and to the and associated thresholds, in order to avoid the calculation of . In this way, given the pruning strategy can be efficiently determined.

Class Sequence AI RA
BD-Rate(Y) BD-Rate(U) BD-Rate(V) BD-Rate(Y) BD-Rate(U) BD-Rate(V)
A1 Tango2 0.08% 0.25% 0.38% 35% 15% 0.03% -0.08% 0.11% 36% 5%
FoodMarket4 0.13% 0.30% 0.20% 28% 11% 0.09% 0.13% 0.10% 30% 2%
Campfire 0.07% 0.13% 0.12% 27% 12% 0.02% 0.17% 0.06% 27% 5%
A2 CatRobot1 0.11% 0.26% 0.26% 29% 13% -0.01% -0.03% 0.10% 36% 9%
DaylightRoad2 0.17% 0.41% 0.25% 28% 13% 0.03% 0.14% 0.08% 37% 9%
ParkRunning3 0.08% 0.17% 0.23% 19% 8% 0.04% 0.13% 0.14% 26% 5%
B MarketPlace 0.08% 0.20% 0.42% 25% 11% 0.03% -0.04% 0.18% 28% 6%
RitualDance 0.08% 0.23% 0.35% 22% 10% 0.07% -0.13% 0.16% 24% 4%
Cactus 0.10% 0.13% 0.30% 25% 12% 0.06% 0.02% -0.02% 26% 6%
BasketballDrive 0.09% 0.18% 0.26% 27% 13% 0.07% -0.11% 0.00% 26% 3%
BQTerrace 0.08% 0.12% 0.40% 20% 10% 0.05% 0.75% -0.12% 24% 3%
C BasketballDrill 0.15% 0.04% 0.13% 22% 11% 0.08% -0.04% -0.15% 27% 4%
BQMall 0.10% 0.15% 0.10% 23% 10% 0.05% -0.04% -0.08% 23% 2%
PartyScene 0.12% -0.04% 0.30% 14% 8% 0.03% 0.07% 0.27% 20% 3%
RaceHorses 0.12% 0.02% 0.15% 17% 9% 0.07% 0.21% -0.32% 22% 3%
D BasketballPass 0.09% 0.51% 0.16% 18% 8% 0.09% -0.10% -0.24% 19% 5%
BQSquare 0.16% -0.05% -0.35% 12% 6% 0.03% -0.18% -1.44% 17% 4%
BlowingBubbles 0.15% 0.16% 0.12% 13% 7% 0.14% 0.29% 0.39% 20% 6%
RaceHorses 0.12% 0.05% -0.33% 15% 6% 0.01% -0.33% 0.31% 17% 5%
E FourPeople 0.12% 0.03% 0.31% 20% 10% - - - - -
Johnny 0.12% 0.21% 0.07% 22% 10% - - - - -
KristenAndSara 0.10% 0.00% 0.16% 21% 9% - - - - -
F BasketballDrillText 0.15% 0.00% 0.05% 21% 8% 0.10% 0.26% 0.15% 25% 7%
ArenaOfValor 0.12% 0.13% 0.11% 20% 6% 0.08% 0.21% 0.15% 24% 7%
SlideEditing 0.09% 0.12% 0.22% 8% 4% 0.01% -0.04% 0.00% 9% 2%
SlideShow 0.14% 0.16% -0.08% 12% 4% 0.16% -0.30% -0.38% 10% 3%
Class A1 0.10% 0.23% 0.23% 30% 13% 0.05% 0.07% 0.09% 31% 4%
Class A2 0.12% 0.28% 0.25% 26% 12% 0.02% 0.08% 0.11% 33% 8%
Class B 0.09% 0.17% 0.35% 24% 11% 0.06% 0.10% 0.04% 26% 4%
Class C 0.12% 0.05% 0.17% 19% 10% 0.06% 0.05% -0.07% 23% 3%
Class E 0.11% 0.08% 0.18% 21% 10% - - - - -
Overall 0.11% 0.16% 0.24% 24% 11% 0.05% 0.08% 0.03% 27% 5%
Class D 0.13% 0.17% -0.10% 15% 7% 0.07% -0.08% -0.25% 18% 5%
Class F 0.13% 0.10% 0.07% 15% 6% 0.09% 0.03% -0.02% 17% 5%
TABLE IV: Performance of the combination of the proposed postponing the trellis initial point and trellis pruning method under AI and RA configurations

V Experimental Results

V-a Performance Evaluations

The proposed low complexity TCQ approaches are implemented on the VVC test platform VTM-4.0 [24]. Simulations are conducted conforming to the JVET Common Test Conditions (CTC) [3] where the recommended test sequences from class A to class F are all involved in the experiment under AI and RA configurations. The QP values are set as {22, 27, 32, 37}, and BD-Rates [2] for Y, U and V components are used to evaluate the coding performance where negative value denotes the performance gain. Computational complexity reduction is measured with the total encoding time-saving and quantization time-saving as follows,