Mobile video accounted for 60% of the global mobile data traffic in 2016 and this percentage is projected to further increase and reach a striking 78% by 2021 (Cisco Visual Networking Index, 2017). Most of this traffic is video-on-demand (VoD) streaming via HTTP Adaptive Streaming (HAS) (Sandvine, 2015), which undoubtedly becomes fast an integral part of the mobile client’s life. In order to keep pace with this explosion of video traffic, significant progress has been made to the development and design of adaptive streaming solutions and standards. For instance, dynamic adaptive streaming over HTTP (MPEG-DASH) is an international standard that uses the existing HTTP web server infrastructure and has become very popular in the last years (Sodagar, 2011).
The main characteristic of HAS solutions that led to their vast deployment in the market is their ability to adjust the play-out quality during the video session. Their target is to deliver the highest possible quality-of-experience (QoE) given the dynamic nature of the wireless channel conditions and the presence of diverse bottlenecks in the video delivery system. In order to achieve that, video files are encoded in various quality representations which are then stored in a web server. Each representation is subdivided in smaller files called segments, usually of constant duration and variable size due to the commonly adopted variable bit-rate (VBR) encoding. After obtaining a manifest file with all the necessary video information, the client sequentially requests and downloads each segment in the quality indicated by the algorithm of the deployed HAS algorithm.
User perceived QoE plays a critical role for the assessment of the various HAS solutions, since it is directly connected to the user engagement and thus, the revenue of content providers. In particular, video stalls and frequent video bit-rate switching are dominating QoE factors for mobile HAS (Seufert et al., 2015). Avoiding stalls due to the depletion of the client’s play-back buffer, while at the same time minimizing the frequency of adaptation and providing high average quality, is a very challenging task, especially at high network load or at poor wireless coverage. The inherent trade-offs between the key video QoE metrics (i.e quality, stability and smoothness) makes this attempt even more difficult.
In order to address these problems, several HAS algorithms have been proposed recently, which can be classified into three main categories with respect to the required input information. Firstly, throughput-based algorithms, such as PANDA (Li et al., 2014) or Festive (Jiang et al., 2012), rely their decision on the observed TCP throughout, which requires a sufficient number of probes to obtain reliable measurements. Secondly, time-based algorithms such as ABMA+ (Beben et al., 2016)
rely on the same principle of probing, but this time to estimate the download time of each segment. Lastly, buffer-based algorithms, such as BBA(Huang et al., 2014) and BOLA (Spiteri et al., 2016), observe and react to the level of the client’s playback buffer. Despite several recent research efforts and proposals, there still appears to be a lack of consensus and an ongoing debate regarding the merits of the above classes of algorithms.
In this paper we try to shed some light to that debate. Specifically, we investigate throughput-based, time-based and buffer-based adaptation algorithms using a set of commonly studied traces of mobile throughput measurements. Since buffer-based adaptation algorithms have not been included before in similar comparisons, the main scope of this work is to provide some insight on the merits of each class of algorithms. Our comparison is based on the implementation of the algorithms in a single simulation framework, which uses throughput information from publicly available network profiles. Furthermore, due to the increasing popularity of live streaming services, which have small buffer sizes due to the strict real-time delay requirements, we also examine the impact of the buffer size on the performance of the HAS algorithms. Therefore another key contribution of this work is the consideration of two typical maximum occupancy buffer levels to investigate the performance for both live and VoD streaming scenarios.
Several similar comparisons can be found in literature. In particular, in (Thang et al., 2014) the authors investigate typical adaptation methods in the context of live video streaming. In (Timmerer et al., 2016) the authors make both subjective and objective studies of various throughput-based adaptive streaming algorithms, but no other class of algorithms is included in this work. (Müller et al., 2012) is a similar experimental evaluation of HAS algorithms on mobile vehicular networks. We have found that a study that focuses on the categorization of the algorithms according to their input dynamics, along with a performance evaluation on mobile vehicular networks that considers live streaming as well, is missing from the literature. To this end, our work presents a comparison of the latest HAS algorithms, per class, which includes buffer-based and time-based solutions for the first time.
The remainder of the paper is organized as follows. In Section 2 we briefly describe the main principles and properties of the selected state-of-the-art HAS algorithms. In Section 3 we describe the validation process followed to obtain our comparison results, including our implementation parameters and simulation factors. Then, in Section 4 we present our results on the performance of the HAS algorithms. We conclude with our remarks in Section 5.
2. Adaptive streaming algorithms
In this section we briefly present the five state-of-the-art HAS algorithms that were studied, implemented and compared. We have chosen the most representative algorithms, per class, as they have been used in literature for other comparisons.
2.1. Throughput-based adaptation
A very common TCP throughput-based adaptation scheme (Jiang et al., 2012; Li et al., 2014) is based on a four-step adaptation model, where initially the available network bandwidth is estimated and then smoothed using noise-filters to avoid estimation errors due to throughput variation. Then, the video bit-rate is indicated based on the discretized output of the smoothing step. The next segment request is scheduled once the inter-request time is estimated.
Conventional is a simple adaptation algorithm, based on the four step model, which equates the current available bandwidth with the TCP throughput, as it is measured during the previous segment download. Then, the proposed video bit-rate is yielded by applying an exponential weighted moving average (EWMA) filter and a dead-zone quantizer. The algorithm determines the inter-request time of the next segment using a bi-modal scheduler, by which the next segment request is scheduled either with a constant delay when the buffer is full or immediately otherwise.
PANDA (Li et al., 2014) is an advanced variation of the four-step model, yet with two distinct modifications. In the estimation step this algorithm uses a more proactive probing mechanism, that is designed to minimize video bit-rate oscillations. The second modification is at the scheduling step, where a more sophisticated scheduler is considered that drives the buffer level towards the maximum buffer occupancy level . At the same time the inter-request time is matched to the necessary time needed to complete the download based on the smoothed estimated value of the available bandwidth.
2.2. Buffer-based adaptation
BBA is a very well known buffer-based adaptation algorithm. In (Huang et al., 2014), the authors introduce a segment map based on the average size of the segments for every representation. The map is defined by two thresholds: i) an upper threshold that drives the policy to select the maximum quality available (), once the instantaneous buffer occupancy surpasses it and ii) a lower threshold that dictates the lowest available quality (), if the buffer is lower than that threshold. In the buffer region between these thresholds the policy may use any non decreasing function to select the quality of the next requested segment.
BOLA (Spiteri et al., 2016) is a buffer-based adaptation algorithm that uses Lyapunov optimization in order to indicate the video bit-rate of each segment. Practically, the algorithm is designed to maximize a joint utility function that rewards an increase in the average quality and penalizes potential re-buffering occurrences. More specifically, a variation called BOLA-O, mitigates video bit-rate oscillations by introducing a form of bit-rate capping when switching to higher bit-rates.
2.3. Time-based adaptation
Download time is considered as a higher level parameter than throughput, thus, in this study, time-based adaptation is treated as a separate class of algorithms. ABMA+ (Beben et al., 2016)
is an adaptation and buffer management algorithm, which selects the video representation based on the predicted probability of video stalling. The algorithm continuously estimates the segment download time and uses a pre-computed play-out buffer map to select the maximum video representation, which guarantees smooth content play-out. The segment download time estimation is based on the same probing mechanism as the throughput-based method, butABMA+ takes into account VBR aspects as well.
3. Experimental framework
3.1. Selected network data-sets
The performance of HAS algorithms is highly correlated with the network conditions during the streaming. As opposed to fixed networks, mobile networks are characterized by their intense throughput variation. Additionally, due to diverse coverage quality there may appear areas with prolonged low bandwidth, which will result to throughput outages and therefore an increased probability of a video stall. In order to avoid stalls, an HAS algorithm needs, at the very least, a network profile that offers a mean throughput at least higher than the lowest available representation stored at the server. Otherwise, the buffer may be completely depleted, leading to a stall event.
In order to obtain insightful results for our comparison, we chose two diverse throughput profiles for our simulations, that are representative of a normal and a challenging network profile in vehicular environments. These profiles correspond to direct throughput measurements from a bus and an underground metro respectively (H. Riiser, 2013). We investigate mobile networks as they are capable of stressing the adaptation methods to highly challenging conditions, as opposed to fixed networks. In particular, we preferred the use of 3G traces as LTE, although more contemporary, offers higher throughput which is not always experienced by the user. Additionally we use an artificial profile, which offers controlled network conditions in order to validate our implementations. In Fig. 1
we can see the cumulative distribution function (CDF) of all studied network profiles.
The controlled profile corresponds to a High-Low-High network profile inspired from (DASH Industry Forum, 2014). This profile is shown in Fig. 3 and it is characterized by the distinct and controlled increases and decreases of the total throughput every 30 s. The normal profile is illustrated in Fig. 2. It is characterized by significant bandwidth variation, which is expected from real networks. In general a high throughput is sustained and no significant outages appear. This profile was chosen as representative of a vehicular terrestrial network profile and corresponds to the ”bus” data-set as described in (H. Riiser, 2013), consisting of 5 traces, after excluding those that showed long outages. The challenging profile corresponds to measurements made on an underground metro and consists of a selection of 7 traces which, in general, show a low throughput throughout the route and there is a long outage period when the metro enters a tunnel towards the end of the trace, as depicted in Fig. 2. This profile allows us to stress the selected HAS algorithms and test their performance under difficult and extreme conditions. We expect to see an increased re-buffering frequency with this ”underground” trace-set.
3.2. Streaming content
As streaming content, we have chosen 3 representative open movies commonly used for testing video codecs and streaming protocols and recommended in the measurement guidelines of the DASH Industry Forum (DASH Industry Forum, 2014). The first movie is Big Buck Bunny (BBB), a high motion computer animated movie of 9:56 min duration. The second is The Swiss Account (TSA), which is a sport documentary with regular motion scenes and a duration of 57:34 min. The third is Red Bull Play Street (RBPS), which is a sport show with high motion scenes and of 1:37 hours duration. For all movies we used the video encodings of (ITEC, 2016) in order to obtain the representation levels , where . We selected a total of video bit-rate levels, based on the quantiles of the CDF distribution of the normal network profile (Table. 1), with a segment duration of s. The particular selection of the representations was made in order to ensure that the minimum representation level is sustainable for 99.9% of the normal profile. Of course this would lead to a very small probability of re-buffering for that case, yet it serves as a good basis for the comparison with the challenging profile. Additionally, we chose a high number of distinct representations in order to make the transitions smoother between quality switches. QoE studies (Seufert et al., 2015) suggest that adaptation amplitude is the dominant adaptation factor, which means that finer granularity switching may compensate for higher switching frequency. One movie was used per trace, chosen at random to ensure unbiased statistics, and it was repeated if the trace duration was larger than its duration.
3.3. Client model and metrics
The client model consists of the maximum buffer level and the selected HAS algorithm that the player may deploy during a video streaming session. We ran our experiments over 12 mobile traces (7 normal network traces and 5 challenging network traces). Furthermore, we investigated the maximum buffer occupancy factor , since various applications (live, VoD, short clips or long movies) may target different maximum buffer occupancy levels. In particular we repeated the experiment for a small s (4 segments), which simulates a live streaming service and for a larger s (23 segments), to simulate the case of VoD. These studied values were selected based on measurements of the maximum buffer level of a popular streaming service, which offers both Live and VoD streaming. We assert that our results hold for any buffer value larger than 4 segments, but leave the full study of the impact of the buffer level to future work. Also two important parameters that may affect the QoE of the user are the initial buffering (i.e the amount of segments that need to be downloaded in the buffer before play-out can start initially) and the re-buffering threshold (i.e the amount of segments that need to be downloaded in the buffer before play-out can resume after a stall event). These parameters were both set equal to segments for our experiments, as indicated in most of the implementation guidelines of the proposed algorithms.
Although a unified framework for measuring QoE is missing from the literature, several related works (Seufert et al., 2015; Oyman and Singh, 2012) suggest that adaptability, instability and un-smoothness of streaming are the most important elements for quantifying QoE in an objective manner. Inspired by (Beben et al., 2016), we selected the following metrics for our comparison.
Adaptability (A) is the average selected video bit-rate per segment in a stream over the minimum of either the average throughput available during the current segment or the maximum available representation
This metric may take values above 1, when the algorithm is aggressive, which may lead to un-smoothness.
Instability consists of the adaptation frequency and, complementary to that, the amplitude of adaptation. Adaptation frequency (AF) is the number of representation switches over the total number of segments , given by
where is the Kronecker delta. Adaptation amplitude (AA) is the normalized average distance, in terms of bit-rate, between the representation levels.
When considering un-smoothness we must take into account the re-buffering duration along with the frequency of re-buffering events. Re-buffering duration (RD) is the total duration of re-buffering events in a stream over the length of the played-out video ,
where if a re-buffering event occurred during the download of segment and otherwise and and are the time of end and the start of the re-buffering event, which occurred during the download of segment , respectively. Re-buffering frequency (RF) is the number of re-buffering events that occurred in a stream over the number of segments
In this section we evaluate the performance of each adaptation algorithm based on the metrics introduced in Section 3. The results are not standalone and a combination of the QoE metrics is required for the performance evaluation of the algorithms. The scope of this paper is not to introduce a QoE model but to present the raw results of the selected metrics.
4.1. Implementation validation
In Fig. 3
the throughput profile (controlled) that was used to validate our implementations is shown. We can see as a first difference, that the buffer-based adaptation starts with a low representation and gradually increases it while the buffer fills up. On the other hand throughput-based methods estimate the available throughput, through probes, and match it to the respective available representation level. The time-based starts with a low representation until the algorithm has a sufficient amount of probes to estimate the download time appropriately. The second significant difference is that buffer-based and time-based adaptation may select a representation higher than the available throughput for a short period as potential throughput drops have not, yet, affected the buffer level or registered in the time-probing sample, respectively. On the other hand, the throughput-based algorithms can be more reactive. It is evident that time-based adaptation has a small delay in the adaptation to the current throughput, as the throughput variation is registered in the time-probing sample as an average of the last 50 probes. At this moment it is worth mentioning that all studied algorithms were designed using either heuristically or based on pre-selected parameters. In all our implementations we used the parameterization proposed by the designers, but better results could be achieved if the parameters were fine-tuned.
Fig. 4 shows that buffer-based algorithms achieve higher adaptability in normal conditions. They are more successful, by design, in conserving high representation levels, even higher than the available throughput, as the adaptability becomes larger than 1. Throughput-based and time-based algorithms show a slightly diminished ability to match the representation to the available average throughput due to the significant throughput variation that characterizes the selected network profiles. We also notice from this figure that the buffer size does not affect significantly the adaptability.
As far as un-smoothness is concerned, Fig. 5 shows that, as expected, the probability of a re-buffering is slightly higher in the cases of a small buffer (i.e live streaming). A small buffer has limited resilience to throughput variation. Regarding the re-buffering frequencies per class of algorithms, we can note that although the performances are very close, on challenging profiles buffer-based algorithms, along with PANDA, are slightly more probable to experience a re-buffering. For normal scenarios we witness smooth streaming from almost all algorithms, due to the absence of long throughput outages of this profile and the design of our simulations (selection of representation levels based on quantiles of normal profile). Nevertheless AMBA+ shows a slightly increased un-smoothness compared with the rest of the algorithms in the normal profile with a small buffer, due to the fact that the number of probes (50) proposed by the algorithm designers is very high compared to the maximum buffer. Therefore a short throughput drop is not registered in the estimation in appropriate time before the buffer has been depleted. Fig. 8 shows the duration (amplitude) of the re-buffering events, where we see re-buffering events lasting about 25% of the video duration. This is expected since the challenging profile includes underground areas, which may cause network outages, for 1/4 of the trace.
Last but not least, instability has a very significant impact on QoE (Seufert et al., 2015). Fig. 6 shows that buffer-based algorithms are about 40% more probable of making a quality switch when the buffer is small. BOLA is optimized to achieve a high efficiency but the stability aspect is not considered in the optimization, since it is addressed with a heuristic in a second phase. BBA has a pre-selected constant higher buffer threshold which makes the segment map less agile to throughput variation when the maximum buffer is small. On the contrary, throughput-based and time-based algorithms appear to switch quality less often, but with similar adaptation amplitude, which is complementary to adaptation frequency if one wants to draw a conclusion on stability. Fig. 7 shows the average normalized distance between switches. The performance regarding this metric shows a slight advantage in favor of the throughput-based algorithms, in both normal and challenging profiles.
It is important to mention that no metric should be treated separately and that only the combination of all metrics allows our comparison to be insightful. Overall, our results match those in (Spiteri et al., 2016) with the addition that BOLA is compared against another buffer-based adaptation for the first time. Moreover, our results can be verified with those in (Beben et al., 2016) where PANDA and BBA are compared against ABMA+. In Table 2 we have gathered the best performing classes of algorithms, per QoE element. This table can serve as insight to the selection of the most appropriate algorithmic class, depending on the application parameters (live, VOD, etc.) and the commonly experienced network conditions.
|Small (16 s)||Normal||Buffer||Time||Time||Time||Throughput|
|Small (16 s)||Challenging||Buffer||Time||Time||Time||Throughput|
|Large (92 s)||Normal||Buffer||Buffer||Buffer||Time||Throughput|
|Large (92 s)||Challenging||Buffer||Buffer||Buffer||Time||Throughput|
In this study, we evaluated the performance of five state-of-the-art adaptive streaming algorithms and made a per class comparison, based on network traces for two different throughput profiles. Additionally, we evaluated the maximum buffer occupancy factor to see how each strategy behaves for smaller and larger buffers, as different services may target different buffer levels. Our conclusion is that buffer-based approaches outperform any other class of algorithms in terms of adaptability, yet they may lack in stability, especially for small buffers, common in live streaming services.
This work provides first guidelines to designers and operators of HAS algorithms for the right algorithmic approach according to expected network conditions and service requirements. Designing robust HAS algorithms for high QoE under changing conditions and requirements, without relying on pre-selected designer-specific parameters or heuristic design, is still a major challenge for research.
- Beben et al. (2016) A. Beben, P. Wiśniewski, J. Mongay Batalla, and P. Krawiec. 2016. ABMA+: Lightweight and Efficient Algorithm for HTTP Adaptive Streaming. In Proc. Int. ACM Conference on Multimedia Systems (MMSys). 2:1–2:11.
- Cisco Visual Networking Index (2017) Cisco Visual Networking Index. 2017. Global mobile data traffic forecast update, 2016–2021. white paper (Feb. 2017).
DASH Industry Forum.
Guidelines for Implementation: DASH-AVC/264 Test cases and Vectors.report (Jan. 2014).
- H. Riiser (2013) C. Griwodz P. Halvorsen H. Riiser, P. Vigmostad. 2013. Commute Path Bandwidth Traces from 3G Networks: Analysis and Applications. Proc. of MMSys 5, 1 (March 2013), 114–118.
- Huang et al. (2014) Te-Yuan Huang, Ramesh Johari, Nick McKeown, Matthew Trunnell, and Mark Watson. 2014. A Buffer-based Approach to Rate Adaptation: Evidence from a Large Video Streaming Service. In Proc. ACM SIGCOMM.
- ITEC (2016) ITEC. 2016. Dynamic Adaptive Streaming over HTTP. (2016). http://www-itec.uni-klu.ac.at/ftp/datasets/DASHDataset2014/ URL verified on March 23, 2017.
- Jiang et al. (2012) Junchen Jiang, Vyas Sekar, and Hui Zhang. 2012. Improving Fairness, Efficiency, and Stability in HTTP-based Adaptive Video Streaming with FESTIVE. In Proc. ACM Int. Conf. on Emerg. Net. Exper. and Techn. (CoNEXT). 97–108.
- Li et al. (2014) Z. Li, X. Zhu, J. Gahm, R. Pan, H. Hu, A. C. Begen, and D. Oran. 2014. Probe and Adapt: Rate Adaptation for HTTP Video Streaming At Scale. IEEE J. Sel. Areas Commun. 32 (April 2014).
- Müller et al. (2012) Christopher Müller, Stefan Lederer, and Christian Timmerer. 2012. An Evaluation of Dynamic Adaptive Streaming over HTTP in Vehicular Environments. In Proceedings of the 4th Workshop on Mobile Video (MoVid ’12). ACM, New York, NY, USA, 37–42.
- Oyman and Singh (2012) O. Oyman and S. Singh. 2012. Quality of experience for HTTP adaptive streaming services. IEEE Communications Magazine 50, 4 (April 2012), 20–27.
- Sandvine (2015) Sandvine. 2015. Global Internet Phenomena: Asia-Pacific & Europe. report (Sept. 2015).
- Seufert et al. (2015) M. Seufert, S. Egger, M. Slanina, T. Zinner, T. Hobfeld, and P. Tran-Gia. 2015. A Survey on Quality of Experience of HTTP Adaptive Streaming. IEEE Commun. Surveys Tuts. 17, 1 (2015), 469–492.
- Sodagar (2011) I. Sodagar. 2011. The MPEG-DASH Standard for Multimedia Streaming Over the Internet. IEEE MultiMedia 18, 4 (April 2011), 62–67.
- Spiteri et al. (2016) K. Spiteri, R. Urgaonkar, and R. K. Sitaraman. 2016. BOLA: Near-optimal bitrate adaptation for online videos. In IEEE INFOCOM.
- Thang et al. (2014) T. C. Thang, H. T. Le, A. T. Pham, and Y. M. Ro. 2014. An Evaluation of Bitrate Adaptation Methods for HTTP Live Streaming. IEEE Journal on Selected Areas in Communications 32, 4 (April 2014), 693–705.
- Timmerer et al. (2016) Christian Timmerer, Matteo Maiero, and Benjamin Rainer. 2016. Which Adaptation Logic? An Objective and Subjective Performance Evaluation of HTTP-based Adaptive Media Streaming Systems. CoRR abs/1606.00341 (2016). http://arxiv.org/abs/1606.00341