CQM
Cumulative quality model for HTTP Adaptive Streaming
view repo
Thanks to the abundance of Web platforms and broadband connections, HTTP Adaptive Streaming has become the de facto choice for multimedia delivery nowadays. However, the visual quality of adaptive video streaming may fluctuate strongly during a session due to bandwidth fluctuations. So, it is important to evaluate the quality of a streaming session over time. In this paper, we propose a model to estimate the cumulative quality for HTTP Adaptive Streaming. In the model, a sliding window of video segments is employed as the basic building block. Through statistical analysis using a subjective dataset, we identify three important components of the cumulative quality model, namely the minimum window quality, the last window quality, and the average window quality. Experiment results show that the proposed model achieves high prediction performance and outperforms related quality models. In addition, another advantage of the proposed model is its simplicity and effectiveness for deployment in realtime estimation. The source code of the proposed model has been made available to the public at https://github.com/TranHuyen1191/CQM.
READ FULL TEXT VIEW PDF
HTTP video streaming is in wide use to deliver video over the Internet. ...
read it
HTTP Adaptive Streaming (HAS) has become a costeffective means for
mult...
read it
Dynamic adaptive streaming over HTTP provides the work of most multimedi...
read it
Adaptive video streaming over HTTP is becoming omnipresent in our daily ...
read it
The paper is concerned with the design and development of a P2P Presenta...
read it
The fundamental conflict between the enormous space of adaptive streamin...
read it
HTTPbased video streaming technologies allow for flexible rate selectio...
read it
Cumulative quality model for HTTP Adaptive Streaming
HTTP Adaptive Streaming (HAS) has become the de facto choice for multimedia delivery nowadays. In HAS, a video is encoded into different quality versions [thang2012_TCE_full]. Each version is further divided into a series of segments. Depending on throughput fluctuations, segments of appropriate quality versions will be delivered from the server to the client, resulting in quality variations during a session. Therefore, a key challenge in HAS is how to evaluate the quality of a session over time. The evaluation can provide service providers with suggestions to enhance the quality of services [TMM_2013_4]. Also, some existing studies deploy the quality model to build and evaluate effective adaptive streaming strategies [TMM_2018_1, TMM_2017_3].
Here, we would like to differentiate three concepts of the quality as follows.
Overall quality means the cumulative quality measured at the end of the session. Obviously, this concept is a special case of the cumulative quality.
Continuous quality
means the instantaneous quality which is continuously perceived at any moment of the session.
Cumulative quality means the quality cumulated from the beginning up to any moment of the session.
It should be noted that the concepts of continuous quality and overall quality have been mentioned in Recommendation ITUR BT.50013 and ITUT P.880 [ITU2004P880, ITU2012BT500] and have been investigated in a large number of previous studies.
To the best of our knowledge, however, few previous studies have actually considered the cumulative quality. In [QoE_tobias2011_memory], the cumulative quality was investigated in the context of Web services. The work in [QoE_ZWangTimevarying] was the first study on the cumulative quality of a video streaming session, where the authors focused on the impact of quality variations. However, this work employed very short sessions, only 5–15 seconds.
In this study, our goal is modeling the cumulative quality of HTTP adaptive video streaming. We first carry out a subjective test to measure the cumulative quality of long sessions of 6 minutes. Then, the impacts of quality variations, primacy, and recency are investigated. Based on the obtained results, a cumulative quality model (called CQM) is proposed. In the proposed model, a sliding window of video segments is the basic unit of computation. It should be noted that, in the following, the term ”window” means either the conceptual sliding window or a window at a certain location. Experiment results show that the average window quality, the minimum window quality, and the quality of the last window are key components of the cumulative quality model. Also, it is found that the proposed model outperforms six existing models. Moreover, the proposed model is applicable to realtime quality monitoring thanks to its low computation complexity. To the best of our knowledge, the proposed model is the first cumulative quality model for actual streaming sessions.
The remainder of this paper is organized as follows. Section 2 discusses the related work. Because the proposed model is based on an analysis of subjective results, the subjective test is presented in Sect. 3. Then, Sect. 4 presents the proposed cumulative quality model. In Sect. 5, we evaluate the performance and computation complexity of the proposed model and compare it to six existing models. Finally, conclusions are drawn in Sect. 6.
In this section, we will discuss the work related to three types of quality, namely, 1) overall quality, 2) continuous quality, and 3) cumulative quality.
The overall video streaming experience of the endusers can be quantified with the concept of Quality of Experience (QoE). In terms of video streaming, the QoE states to what extent users are annoyed or delighted with the provided streaming [qualinet2013qoe]. In [QoE_tobias2013_DTMA, QoE_tobias2012_initial], it was found that the impact of the initial delay of the video stream is not severe, whereas the impact of stalling, i.e., playback interruptions, is significant. To model the impact of the interruptions, previous studies generally used some statistics such as the number of interruptions [QoE_singh2012qualityML, QoE_liu2015_deriving], the average [QoE_singh2012qualityML], the maximum [QoE_singh2012qualityML], the sum [QoE_rodriguez2016_video, QoE_liu2015_deriving], and the histogram [tran2016_GC] of interruption durations. To ensure a smooth streaming when endusers face throughput fluctuations, e.g., in mobile networks, HAS allows to adapt the video bit rate to the network conditions. Thereby, initial delay and stalling can be reduced, which are severe QoE degradations of video streaming. However, due to the bit rate adaptation, the visual quality of the video might vary, which introduces an additional QoE factor [QoE_seufert2015_survey].
Existing studies on overall visual quality were mostly limited to short sessions (about 1–3 minutes) [QoE_bellLab2013_QoEmodel, QoE_ywang2015_assessing, tran2017_IEICEhistogram, QoE_tobias2014_assessing, QoE_seufert2015_impact]. These studies mainly focused on the impact of the quality variations. The impact of the quality variations is modeled by some statistics of segment quality values and switching amplitudes (i.e., differences between consecutive segment quality values) such as average [QoE_bellLab2013_QoEmodel]
[QoE_bellLab2013_QoEmodel], minimum [QoE_ywang2015_assessing], median [QoE_ywang2015_assessing], histogram [tran2017_IEICEhistogram], and time duration on different quality levels [QoE_tobias2014_assessing, QoE_seufert2015_impact].For long sessions, the primacy and recency are also important factors to be considered. Here, the primacy (recency) factor refers to the influences of quality degradations near the beginning (end) of a session. The authors in [QoE_tavakoli2016_JSAC] found that the primacy and recency both have significant impacts on the overall quality of a session. [QoE_Seufert2013_pool] studies different temporal pooling methods, which emphasize different aspects (e.g., recency, lowest quality), for aggregating objective quality metrics into an overall quality score. In [QoE_rodriguez2016_video], the authors proposed an overall quality model, taking into account the impacts of the quality variations, primacy, and recency. Specifically, a session is divided into three temporal intervals. In each interval, the impact of quality variations is modeled by the frequencies of switching types. Each switching type is defined based on resolutions and frame rates. To take into account the impact of the primacy and recency, each interval is simply assigned a weight to represent its contribution to the overall quality of the session. The experiment results then revealed that the first interval has the highest weight, and so the largest contribution to the overall quality.
In the latest stage of ITUT P.1203 standardization for quality assessment of streaming media, a model (called P.1203) is recommended for predicting the overall quality, where session durations are from 1 to 5 minutes [ITU1203_3]. The P.1203 model also takes into account the impacts of quality variations, primacy, and recency. Then, to model the impact of quality variations, the authors used the average of the segment quality values in each temporal interval and various statistics calculated over a whole session, such as the total number of quality direction changes and the difference between the maximum and minimum segment quality. To take into account the impact of the primacy and recency, the authors used a weighted sum of all segment quality values in the session.
The recommendation ITUR BT.50013 describes the Single Stimulus Continuous Quality Evaluation (SSCQE) method for subjective assessment of the continuous quality. In this method, test sessions are displayed in a random order. Each subject, while watching a video, is asked to continuously move a slider along a continuous scale so that its position reflects his/her selection of quality at that instant. All subjects’ quality ratings at each instant of each video are averaged to compute a mean opinion score (MOS) of that instant.
The work in [QoE_Chen2014] is the first study on the continuous quality of a streaming session. Note that, in this paper, the authors use the term ”timevarying quality” to refer to ”continuous quality”. To measure the continuous quality, the authors conducted a subjective test similar to the SSCQE method. Then, a continuous quality model is proposed, taking into account the impact of the recency. In particular, a HammersteinWiener model was employed to predict the continuous quality of 5minute long sessions. As this work is focused on continuous quality, the model only depends on the quality values of the last 15 seconds.
[QMon_shafiq_infocom18]
uses machine learning to predict initial delay, stalling, and video quality from the network traffic in windows of 10 s. The considered features are derived from IP or TCP/UDP headers only. ViCrypt
[QMon_seufert2019_stream] detects QoE degradations on encrypted video streaming traffic in realtime within 1 s by using a streamlike analysis approach with two continuous sliding windows and a cumulative window. The features are based on packetlevel statistics of the network traffic, and allow to accurately recognize initial delay and stalling [QMon_seufert2019_stream], as well as video resolution and the average bitrate [QMon_wassermann2019_let].[texas2018] presents a continuoustime QoE predictor using an ensemble of HammersteinWiener models, while [QoE_BampisRecurrent2018ML]
developed a neuralnetworkbased continuous quality model. As discussed in Recommendation ITUR BT.50013
[ITU2012BT500], the continuous quality values of a session can be utilized to obtain the overall quality. However, this issue is currently under study [ITU2012BT500, QoE_bovikTimeVarying2011, QoE_bampis2017Dataset].To the best of our knowledge, the only previous study on the cumulative quality of a streaming session is in [QoE_ZWangTimevarying], where the authors presented some qualitative observations regarding the impact of quality variations. However, the authors employed simple simulated sessions of very short durations (5–15 seconds) with only 1–3 segments. It is found that when there is a small switching amplitude, the cumulative quality is quite stable with a slight change. Meanwhile, a large switching amplitude results in a significant change of the cumulative quality. From these observations, the authors proposed a cumulative quality model, in which a piecewise linear function of switching amplitudes was used to quantify the impact of the quality variations.
The preliminary work of our cumulative quality research was presented in [tran2018_QoMEX]
. In this paper, the previous work is extended significantly. First, we carried out more subjective tests with new videos and so the dataset is now doubled. Second, factors in the model are extensively studied with oneway analysis of variance (ANOVA). In addition, the impacts of window size and window quality model on the model performance are explored in detail and the best setting is recommended. Finally, the evaluation is extended with more related models and indepth analysis of models’ performances with respect to the length of sequences as well as models’ computation complexity.
The contributions of our work have two general categories. First, we build a dataset that is specific to the cumulative quality. Our dataset helps to investigate how existing overall quality models perform cumulative quality prediction. Second, we propose a new cumulative quality model that can well predict the cumulative quality of streaming sessions. In particular, the distinguished features of our study are as follows.
First, a subjective test was specifically designed for measuring the cumulative quality of HAS sessions. In our test, there are in total 72 test sequences generated from six 6minute long videos. The total time required for rating these sequences was approximately 160 hours.
Second, through statistical analysis, insights into the impacts of three factors of quality variations, primacy, and recency are provided. In particular, it is found that the impacts of the quality variations and recency are significant. However, no significant impact of the primacy is observed.
Third, we proposed a new cumulative quality model that takes into account the impacts of the quality variations and recency. Experiment results show that the proposed model is able to predict well the cumulative quality of streaming sessions.
Fourth, a comparison of the proposed model with six existing models was conducted. This is the first time a large number of quality models have been investigated for cumulative quality prediction. Experiment results show that the proposed model outperforms the existing models.
Fifth, it was found that the proposed model is applicable to realtime quality monitoring thanks to its low computation complexity. This feature is especially important for costeffective evaluation of streaming technologies.
In this study, to measure the cumulative quality over time, each streaming session was converted into test sequences of different lengths. In the test, each subject viewed a random sequence and then rated the quality of the whole sequence. This approach is similar to that used in [QoE_ZWangTimevarying], where each 15second long session was divided into three sequences of 5, 10, and 15 (seconds).
Video  Content  Type 

Video #1  Slow movements of characters  Animated video, Movie 
Video #2  A story about Sintel and her friend, a dragon.  Animated video, Movie 
Video #3  Conversations of characters  Natural video, Movie 
Video #4  A talk show host analyzing news  Natural video, News 
Video #5  A documentary about the science experiment  Natural video, Documentary 
Video #6  A soccer match  Natural video, Sport 
There are in total six 6minute long videos used in this study, denoted by Video #1, Video #2, Video #3, Video #4, Video #5, and Video #6, with features presented in Table 1. These videos were encoded using H.264/AVC (libx264) with a frame rate of 24 fps. In this study, we used two adaptation sets, each consisted of 9 versions with different QP values and/or resolutions. In particular, the 9 versions in the first adaptation set have the same resolution of 1280720 and 9 different QP values of 52, 48, 44, 40, 36, 32, 28, 24, and 20. The first adaptation set was used to generate the streaming sessions of Video #1, Video #2, and Video #3. The 9 versions in the second adaptation set are different in both resolution and QP. Specifically, the 9 versions correspond to 9 combinations of QP values and resolutions of {24, 256144}, {26, 426240}, {24, 426240}, {26, 640360}, {24, 640360}, {26, 854480}, {24, 854480}, {26, 1280720}, {24, 1280720}. The second adaptation set was used to generate the streaming sessions of Video #4, Video #5, and Video #6. The average bitrates of the versions are shown in Table 2. In this study, every version is divided into short segments with the duration of 1 second.
Version  Average bitrate (kbps)  
Video #1  Video #2  Video #3  Video #4  Video #5  Video #6  
1  146  187  187  179  455  570 
2  196  239  244  310  794  1034 
3  310  333  353  382  1010  1304 
4  455  482  528  548  1397  1823 
5  717  717  813  675  1764  2295 
6  1118  1097  1263  791  2017  2647 
7  1751  1743  2005  977  2549  3330 
8  2802  2910  3362  1303  3209  4382 
9  4538  4993  6089  1613  3930  5500 
For each video, two fulllength sessions of 6 minutes were generated by using the adaptation method of [thang2013_JCN] and two bandwidth traces from a mobile network [HAS_muller2012_evaluation]. The duration of 6 minutes was selected such that it is longer than the average video duration watched on YouTube, which is 5:01 minutes [QoE_nam2016_qoewhycat]. The bandwidth traces have average throughputs varying from 1484.87 kbps to 3432.33 kbps, and standard deviations from 867.01 kbps to 1252.75 kbps. An example of version variations in a 6minute session is provided in Fig. 1.
From each fulllength session, six test sequences were extracted, from the timestamp 0 to the 1, 2, 3, 4, 5, and 6 minute. So, from the six original videos, there were in total 72 test sequences, with durations from 1 minute to 6 minutes. The total duration of all the test sequences is 252 minutes. Because a rating time which is longer than 1.5 hours may cause fatigue and boredom [P.9132014], the subjective test was divided into four parts that were conducted in different days. The duration of each part was approximately 1.5 hours, of which about 1 hour was spent for rating the test sequences. In the rating process, every 20 minutes, there was a break of 10 minutes. In order to avoid boredom, each subject took part in at most two test parts.
The subjective test was conducted using the absolute category rating (ACR) method. Test conditions were designed following Recommendation ITUT P.913 [P.9132014]. In the subjecttraining stage, the subjects got used to the procedure and the range of quality impairments. In the test, the sequences were randomly displayed on a black background. The screen has the size of 14 inches and a resolution of 1366768. Given a sequence, each subject gave a score at the end of the sequence with the value ranging from 1 (worst) to 5 (best), which reflects his/her option of quality of the whole sequence.
There were in total 71 subjects taking part in the test. The total time of the test was approximately 160 hours. Screening analysis of the test results was performed following Recommendation ITUT P.913 [P.9132014], and two subjects were rejected. After discarding these subjects’ scores, each test sequence was rated by 23 valid subjects. The MOS of each sequence was computed as the average of the valid subjects’ scores.
The 95% confidence intervals of subjective scores are shown in Fig.
2. In general, the confidence intervals are in the range 0.08 to 0.35. Also, the subjective scores are in the range from 2 to about 4.7. This means the cumulative quality varies drastically during a session.To build a cumulative quality model taking into account the impacts of multiple factors, the basic ideas of our solution are as follows.
Quality variations over a long session are divided into longterm and shortterm changes. Specifically, shortterm changes refer to quality variations of neighboring segments, while longterm changes refer to quality variations between temporal intervals.
To represent the impact of longterm changes, the concept of ”sliding window” is used. Specifically, a window of segments is moved along the session, segment by segment as illustrated in Fig. 3. After each time, a window quality value is computed.
To represent the impact of shortterm changes within a window, an existing overall quality model is used. For this purpose, such as model is called window quality model.
The cumulative quality value at any time point is computed based on window quality values, taking into account the impacts of factors such as longterm changes and recency. Note that, at the first time points, when the watched video duration is (very) short (i.e., less than segments), the corresponding cumulative quality values are directly computed from the window quality model.
In the next subsection, effect analysis of the quality variations, primacy, and recency will first be presented. Then, based on the obtained results, a cumulative quality model will be proposed.
As mentioned, to identify the key components of a cumulative quality model, we carried out a statistic analysis of some window quality values. In particular, the first window quality value and the last window quality value were employed to represent the impacts of the primacy and recency respectively. For the factor of longterm changes, three parameters are considered, which are the average quality , the minimum quality , and the maximum quality of all windows until a given time point.
Suppose that the window is just moved to the segment with . By using the window quality model, the window quality value is calculated. After that, the statistics of , , , , and are updated by the following equations.
(1) 
(2) 
(3) 
(4) 
(5) 




30  F  0.01  13.53  3.70  45.87  70.12  
p  0.94  <0.001  0.06  <0.001  <0.001  
0.00  0.17  0.05  0.40  0.51  
50  F  0.73  39.51  0.00  65.32  67.10  
p  0.40  <0.001  0.97  <0.001  <0.001  
0.01  0.37  0.00  0.49  0.50  
70  F  1.20  41.18  0.39  52.34  64.83  
p  0.28  <0.001  0.76  <0.001  <0.001  
0.02  0.38  0.01  0.43  0.49 
Table 3 shows the obtained results from oneway analysis of variance (ANOVA). To assess the effect size, partial Etasquared values () are also reported in Table 3. Here, the window quality model is the model proposed in [tran2016_GC] (called Tran’s), and the window size is set to 30, 50, and 70 seconds.
The values in Table 3 indicate that, for all the considered window sizes, no significant effect was observed for (i.e., ). In contrast, significant results with large effects were obtained for (i.e., and ). This implies that the impact of the primacy on the cumulative quality can be neglected, while the impact of the recency has to be considered.
With regard to longterm changes, no significant effect was found for (i.e., ), yet significant effects with large sizes were observed for and (i.e., and ). This implies that only the minimal and average quality have to be considered.
To sum up, the results suggest that , , and should be key components of a cumulative quality model. Based on these observations, we propose a cumulative quality model which is given by
(6) 
where and are the corresponding weights of , , and components.
It is interesting to note that the proposed model is in agreement with the peakend rule [QoE_Kahneman1993]. The peakend rule says that users judge an experience largely at its peak and at its end. Here the peak of a session is the most severe quality impairment , and the end is the significance of the recency effect shown in our model. In the case of speech quality, a large impact of the minimum quality on QoE was also shown [QoE_Koster2017]. For HTTP adaptive streaming, it is found in [QoE_tobias2014_assessing, QoE_seufert2015_impact] that the number of quality switches is not statistically significant, but the time the video is played out on each quality level. Also [QoE_Seufert2013_pool] showed that a good temporal pooling method is taking the average over the whole session, implying that is a key influence factor. Thus all the key factors of the proposed model are inline with the findings in previous studies. Yet, the CQM model is the first one that integrates these factors into a single model for predicting the cumulative quality of HAS sessions.
In the next section, we will investigate the performance of the proposed model and some existing models.
This section is divided in two evaluations, each aiming at an important question. In the first evaluation, we will investigate what is the best setting (e.g., window quality model and window size) for the proposed model. The second evaluation is carried out to see if existing overall quality models can predict cumulative quality, especially in long sessions.
There are in total six existing models employed in this study, which are denoted by Tran’s [tran2016_GC], Guo’s [QoE_ywang2015_assessing], Vriendt’s [QoE_bellLab2013_QoEmodel], Yin’s [QoE_Yin_2015], P.1203 [ITU1203_3, ITUT_implement1, ITUT_implement2, ITUT_implement3], and Rehman’s [QoE_ZWangTimevarying]. Among these models, only the Rehman’s model was proposed for cumulative quality prediction, the other models were originally proposed for overall quality prediction.
Similar to [QoE_Database_ZDuanmu2018, QoE_QoEIndex_ZDuanmu2018], to evaluate the performance of existing models, we implemented the models using the parameter settings stated in the original papers. In addition, following Recommendation ITUT P.1401 [ITUT_Rec1401]
, a first order linear regression between predicted scores and MOSs was performed for each model to compensate for the possible variances between subjective tests. The obtained coefficients of slope and intercept will be stated in the following subsections.
For the evaluations, the 72 sequences in our dataset were randomly divided into two sets, namely a training set of 36 sequences and a test set of 36 remaining sequences. The training set was used to obtain the model parameters by curve fitting. The test set was to evaluate the performance of the models. We randomly selected 50 training sets. For each training set, the remaining sequences were used for the corresponding test set.
In order to measure the performance of the models, we used two metrics of Pearson Correlation Coefficient (PCC) and RootMeanSquared Error (RMSE). The PCC and RMSE values reported below were averaged over the 50 test sets. Since the capability of realtime processing is an especially important feature for cumulative quality models, we also measured computation complexity of the models. In this study, the computation complexity was measured as the average time required to obtain a cumulative quality value per 1second long segment. The measurement was conducted on a computer with Intel Core i32120 processor at 3.30GHz and 8GB RAM.
Model  Performance  

Training set  Test set  
PCC  RMSE  PCC  RMSE  
CQM+Tran’s  0.94  0.26  0.93  0.27 
CQM+Guo’s  0.91  0.31  0.89  0.34 
CQM+Vriendt’s  0.93  0.29  0.92  0.31 
CQM+Yin’s  0.92  0.31  0.91  0.33 
CQM+P.1203  0.94  0.26  0.92  0.28 
In this subsection, we investigate the performance of the proposed model under different settings. Our goal is to find the best settings of 1) window quality model and 2) window size of the proposed model.
For this purpose, we first present results using some different window quality models. Then, various window sizes are investigated. Finally, the model parameters are determined based on result analysis.
In this part, the five overall quality models of Tran’s, Guo’s, Vriendt’s, Yin’s, and P.1203 are employed to obtain window quality values. Note that these models all take into account the impact of shortterm changes. Further, note that Rehman’s is a cumulative model, which was not used here, but is only used later for comparison purpose.
Table 4 shows the performance of the CQM model using the different window quality models with the window size of 50 seconds. It can be seen that the performance of the CQM model is generally good with all the window quality models. Especially, the combination CQM+Tran’s provides the best prediction performance. Specifically, the values of PCC and RMSE are 0.94 and 0.26 for the training set, and 0.93 and 0.27 for the test set. The main reason is that Tran’s model utilizes the histograms of segment quality values and switching amplitudes which are shown to be more effective in modeling the impact of shortterm changes than the statistics used in the other models [tran2017_IEICEhistogram].
Since the combination CQM+Tran’s provides the best performance, Tran’s model is used as the window quality model in the rest of this paper.
In this part, the performance of the CQM model is evaluated using different window sizes. As mentioned, Tran’s model is employed to obtain window quality values. Fig. 4 shows the performance of the proposed model with different window sizes ranging from 2 to 90 seconds with the step size of 2 seconds. It is clear that, given a window size, the training set always achieves higher PCC values and lower RMSE values than that of the test set. In general, the behaviors of the PCC and RMSE curves for the training and test sets are similar. In particular, the prediction performance improves quickly (i.e., PCC value increases quickly while the RMSE value drops quickly) when the window size is increased to 14 seconds. When the window size is from 14 to 50 seconds, some small improvements are observed for the PCC and RMSE. The best prediction performance for the test set is achieved with the window size of 50 seconds. Specifically, the PCC and RMSE values are 0.94 and 0.26 for the training set, and 0.93 and 0.27 for the test set. When the window size increases beyond 50 seconds, the PCC falls sharply and the RMSE rises dramatically. Therefore, to achieve the highest performance, the window size should be 50 seconds. In the rest of this paper, this value of the window size will be used.
Similar to [QoE_liu2015_deriving], in order to obtain the model parameters, we pick the (best) training set, which provides the highest PCC for the corresponding test set. The best performance is given by
(7)  
(8) 
The high numerical values of the weights and reconfirm the observations in Sect. 4 that , , and are key components of the cumulative quality model. Also, the impacts of the quality variations and recency are significant on the cumulative quality of a session. In addition, it can be seen that is highest while is lowest. So the impact of the average window quality is strongest, and the impact of the minimum window quality is weakest.
In this subsection, we compare the CQM model and the six existing models in terms of the prediction performance and the computation complexity.
Fig. 5 shows the PCC values of the models with different sequence lengths. We can see that, when the sequence length is 1 minute, the PCC values of Tran’s, Guo’s, Vriendt’s, Yin’s, and P.1203 models are high (i.e., PCC ). This suggests that these models can predict well the overall quality of a short session, and thus each of them can be used as a window quality model with good performance as discussed in Subsect. 5.2.1.
However, when the sequence length increases, the PCC values of the models decrease. Among the models, the PCC of the CQM model is highest for all the sequence lengths. Meanwhile, the performance of Rehman’s model is lowest. A possible explanation is that Rehman’s model is designed using very short sessions with a duration of 5–15 seconds. Thus it is not really suitable for longer sessions (i.e., 1–6 minutes). In addition, there is no consideration for longterm changes and recency in Rehman’s, Tran’s, Guo’s, Vriendt’s, and Yin’s models, so the performances of these models are all lower than that of the CQM model.
Model  Coefficients 

Computation Comlexity (ms)  

Slope  Intercept  PCC  RMSE  
Tran’s  1.24  1.27  0.89  0.31  0.22  
Guo’s  1.01  0.25  0.72  0.49  0.02  
Vriendt’s  1.02  0.41  0.85  0.37  0.05  
Yin’s  1.07  0.79  0.80  0.42  0.06  
P.1203  1.04  0.93  0.89  0.32  1682.82  
Rehman’s  3.00  0.42  0.62  0.67  0.05  
CQM  —  —  0.93  0.27  0.20 
Table 5 summarizes the performances and the computation complexity of the models. Here, the PCC and RMSE are averaged over the 50 test sets containing sequences of different lengths. We can see that the results of performances are similar to those in Fig. 5. In particular, the performance of the CQM model is highest and the performance of Rehman’s model is lowest.
Regarding the computation complexity, it can be seen that the CQM model takes less than 1ms to obtain a cumulative quality value, and so the cumulative quality can be updated after every segment as the window slides forward. In other words, the CQM model is applicable to realtime quality monitoring.
For the P.1203 model, its computation complexity is considerably higher than the others. In particular, the P.1203 model takes an average of 1.68s to calculate a cumulative quality value. Meanwhile, the remaining models have an average processing time less than 1ms per cumulative quality value.
To better understand the cumulative quality, Fig. 6 shows the MOSs and the predicted scores by the CQM model corresponding to the adaptation result in Fig. 1. We can see that the predicted scores closely follow to the MOSs. In addition, the cumulative quality fluctuates strongly during the session. This means that evaluating the overall quality at the end of a streaming session is obviously not enough to fully understand the quality of the video streaming service. So, cumulative quality over time is of crucial importance in adaptive streaming.
In this paper, we have presented a model for predicting the cumulative quality of adaptive video streaming. The proposed model was developed based on the concept of a ”sliding window” over a streaming session, where each window is characterized by a quality value.
First, a subjective test was specifically designed and conducted for measuring the cumulative quality. Second, through statistical analysis, it was found that the impacts of the quality variations and recency are significant. We integrated the significant key components, namely, the average window quality, the minimum window quality, and the last window quality, into a new cumulative quality model CQM, which is able to accurately predict the cumulative quality of streaming sessions. The advantage of the proposed CQM model is its simplicity, while being inline with other well known effects from literature, namely, the applicability of simple temporal pooling plus the peakend rule.
The CQM model was compared with six existing models, where it could outperform the other models in predicting the cumulative quality. Moreover, the proposed model is applicable to realtime quality monitoring thanks to its low computation complexity. This feature is especially important for costeffective evaluation of streaming technologies, e.g., for realtime quality monitoring of video streams. In the future, the model will be used to assess the quality of different adaptive streaming techniques. Also, we will develop novel quality adaptation strategies, which are based on the CQM model.
Comments
There are no comments yet.