CBA: Contextual Quality Adaptation for Adaptive Bitrate Video Streaming (Extended Version)

01/17/2019 ∙ by Bastian Alt, et al. ∙ University of Central Florida 0

Recent advances in quality adaptation algorithms leave adaptive bitrate (ABR) streaming architectures at a crossroads: When determining the sustainable video quality one may either rely on the information gathered at the client vantage point or on server and network assistance. The fundamental problem here is to determine how valuable either information is for the adaptation decision. This problem becomes particularly hard in future Internet settings such as Named Data Networking (NDN) where the notion of a network connection does not exist. In this paper, we provide a fresh view on ABR quality adaptation for QoE maximization, which we formalize as a decision problem under uncertainty, and for which we contribute a sparse Bayesian contextual bandit algorithm denoted CBA. This allows taking high-dimensional streaming context information, including client-measured variables and network assistance, to find online the most valuable information for the quality adaptation. Since sparse Bayesian estimation is computationally expensive, we develop a fast new inference scheme to support online video adaptation. We perform an extensive evaluation of our adaptation algorithm in the particularly challenging setting of NDN, where we use an emulation testbed to demonstrate the efficacy of CBA compared to state-of-the-art algorithms.



There are no comments yet.


page 1

page 12

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Video streaming services such as Netflix, YouTube, and Twitch, which constitute an overwhelming share of current Internet traffic, use adaptive bitrate streaming algorithms that try to find the most suitable video quality representation given the client’s networking conditions. Current architectures use Dynamic Adaptive Streaming over HTTP (DASH) in conjunction with client-driven algorithms to adjust the quality bitrate of each video segment based on various signals, such as measured throughput, buffer filling, and derivatives thereof. In contrast, new architectures such as SAND [1] introduce network-assisted streaming via DASH-enabled network elements that provide the client with guidance, such as accurate throughput measurements and source recommendations. Given the various adaptation algorithms that exist in addition to client-side and network-assisted information, a fundamental question arises on the importance of this context information for the Quality of Experience (QoE) of the video stream.

The problem of video quality adaptation is aggravated in Future Internet architectures such as Named Data Networking (NDN). In NDN, content is requested by name rather than location, and each node within the network will either return the requested content or forward the request. Routers are equipped with caches to hold frequently-requested content, thereby reducing the round-trip-time (RTT) of the request while simultaneously saving other network links from redundant content requests. Several attempts to make DASH-style streaming possible over NDN exist, e.g., [2], for which the key difficulty is that traditional algorithms rarely play to the strengths of NDN where the notion of a connection does not exist. Throughput, for example, is not a trivial signal in NDN as data may not be coming from the same source.

In this paper, we closely look at the problem of using context information available to the client for video quality adaptation. Note that our problem description is agnostic to the underlying networking paradigm, making it a good fit to traditional IP-based video streaming as well as NDN. In essence, we consider the fundamental problem of sequential decision-making under uncertainty where the client uses network context information received with every fetched video segment. In Fig. 1 we show a sketch where the client adaptation algorithm decides on the quality of the next segment based on a high-dimensional network context. We model the client’s decision on a video segment quality as a contextual multi-armed bandit problem aiming to optimize an objective QoE metric that comprises (i) the average video quality bitrate, (ii) the quality degradation, and (iii) the video stalling.

Fig. 1: A standard client-based and/or network-assisted ABR streaming model (black) with the proposed Context-based Adaptation—CBA (dotted). In CBA, high-dimensional context features from the network, along with client-side information, undergo sparsity enforcement to shrink the impact of unimportant features.

One major challenge with incorporating high-dimensional network context information in video quality adaptation is extracting the information that is most relevant to the sought QoE metric. We note that the interactions within this context space become complicated given the NDN architecture, where the network topology and cache states influence the streaming session. Our approach introduces a sparse Bayesian contextual bandit algorithm that is fast enough to run online during video playback. The rationale behind the sparsity is that the given information, including network-assisted and client-side measured signals such as buffer filling and throughput, constitutes a high-dimensional context which is difficult to model in detail. Our intuition is that, depending on the client’s network context, only a few input variables have a significant impact on QoE. Note, however, that sparse Bayesian estimation is usually computationally expensive. Hence, we develop here a fast new inference scheme to support online quality adaptation.

Our contributions in this paper can be summarized as:

  • We formulate the quality adaptation decision for QoE maximization in ABR video streaming as a contextual multi-armed bandit problem.

  • We provide a sparse Bayesian contextual bandit algorithm, denoted CBA, which is computationally fast enough to provide real-world video players with quality adaptation decisions based on the network context.

  • We show emulation testbed results and demonstrate the fundamental differences to the established state-of-the-art quality adaptation algorithms, especially given an NDN architecture.

The developed software is provided here111 The remainder of this paper is organized as follows: In Sect. II, we review relevant related work on ABR video streaming and contextual bandits. In Sect. III, we present the relevant background on ABR video streaming. In Sect. IV, we model the quality adaptation problem as a contextual multi-armed bandit problem before providing a fast contextual bandit algorithm for high-dimensional information. In Sect. V, we show how ABR streaming uses CBA and define a QoE-based reward. We describe the evaluation testbed before providing emulation results in Sect. VI. Section VII concludes the paper.

Ii Related Work

In the following, we split the state-of-the-art related work into two categories; i.e., work on ABR quality adaptation, especially in NDN, and related work on contextual bandit algorithms with high-dimensional covariates.

Significant amounts of research have been given to finding streaming architectures capable of satisfying high bitrate and minimal rebuffering requirements at scale. CDN brokers such as Conviva [3] allow content producers to easily use multiple CDNs, and are becoming crucial to meet user demand [4]. Furthermore, the use of network assistance in CDNs has received significant attention recently as a method of directly providing network details to DASH players. SAND [1] is an ISO standard which permits DASH enabled in-network entities to communicate with clients and offer them QoS information. SDNDASH [5] is another such architecture aiming to maintain QoE stability across clients, as clients without network assistance information are prone to misjudge current network conditions, causing QoE to oscillate. Beyond HTTP, the capabilities of promising new network paradigms such as NDN pose challenges to video streaming. The authors of [2] compare three state-of-the-art DASH adaptation algorithms over NDN and TCP/IP, finding NDN performance to notably exceed that of TCP/IP given certain network conditions. New adaptation algorithms specific to NDN have also been proposed, such as NDNLive [6], which uses a simple RTT mechanism to stream live content with minimal rebuffering.

In this work, we model the video quality adaptation problem as a contextual bandit problem assuming a linear parametrization, which has successfully been used, e.g., for ad placement [7]. Another promising approach is based on cost-sensitive classification in the bandit setting [8]. Recently, [9]

has discussed the use of variational inference in the bandit setting, wherein Thompson sampling is considered to cope with the exploration-exploitation trade-off. By assuming a high-dimensional linear parametrization, we make use of sparse estimation techniques. High-dimensional information arises in video streaming due to the network context. Sparsity has been a major topic in statistical modeling and many Bayesian approaches have been proposed. Traditionally, double exponential priors which correspond to

regularization have been used. However, these priors often fail due to limited flexibility in their shrinkage behavior. Other approaches that induce sparsity include ’spike-and-slab’ priors [10] and continuous shrinkage priors. Between these two, continuous shrinkage priors have the benefit of often being computationally faster [11]. For our approach we use the Three Parameter Beta Normal (TPBN) continuous shrinkage prior introduced by [11], which generalizes diverse shrinkage priors, e.g, the horseshoe prior [12], the Strawderman-Berger prior, the normal-exponential-gamma prior, and the normal-gamma prior.

Iii Adaptive Bitrate Streaming: Decisions under Uncertainty

In this section, we review the established model for quality adaptation in ABR video streaming and highlight the changes that arise when streaming over NDN.

Iii-a Adaptive Bitrate Streaming: A Primer

In adaptive bitrate streaming, the content provider offers multiple qualities of the same content to clients, who decide which one to pick according to their own client-side logic. Each video is divided into consecutive segments which represents some fixed seconds of content. These segments are encoded at multiple bitrates corresponding to the perceived average segment quality. In practice, segment lengths are often chosen to be two to ten seconds [13] with several distinct quality levels to choose from, such as 720p and 1080p. Let represent the set of all available video qualities, such that and for all ; i.e., a higher index indicates a higher bitrate and better quality. Let the -th segment encoded at the -th quality be denoted .

Received video segments are placed into a playback buffer which contains downloaded, unplayed video segments. Let the number of seconds in the buffer when segment is received be , and let the playback buffer size BUF_MAX be the maximum allowed seconds of video in the buffer. By convention, we define and write the recursion of the buffer filling as , where denotes the fetch time for . A stalling event is ascribed to the -th segment when . Note that the recursion above holds only if ; i.e., the client is blocked from fetching new segments if the playback buffer is full. If this occurs, the client idles for exactly seconds before resuming segment fetching. In some related work [13], BUF_MAX is chosen between 10 and 30 seconds.

To allow the client to select a segment in the -th quality, the client fetches a Media Presentation Description (MPD), an XML-like file with information on the available video segments and quality levels, during session initialization. After obtaining the MPD, the client may begin to request each segment according to its adaptation algorithm. In general, uncertainty exists over the segment fetch time. The most prevalent quality adaptation algorithms take throughput estimates [14] or the current buffer filling [15], or combinations and functions thereof to make a decision on the quality of the next segment . The decision aims to find the segment quality which maximizes a QoE metric, such as the average video bitrate, or compound metrics taking the bitrate, bitrate variations, and stalling events into account.

Iii-B Streaming over Named Data Networking

In NDN, consumers or clients issue interests which are forwarded to content producers, i.e., origin servers, via caching-enabled network routers. These interests are eventually answered with data provided by the producer or an intermediary router cache. To request a video, a consumer will first issue an interest for the MPD of the video. Each is given a name in the MPD, e.g., of the form /video ID/quality level/segment number. The client issues an interest for each data packet when requesting a particular segment. Since NDN data packets are of a small, fixed size, higher-quality video segments will require more data packets to encode. We do not permit the client to drop frames, so all data packets belonging to some segment must be in the playback buffer to watch that segment.

Iv A Fast Contextual Bandit Algorithm for High Dimensional Covariates

In this work, we model the problem of video quality adaptation as a sequential decision-making problem under uncertainty, for which a successful framework is given by the multi-armed bandit problem dating back to [16]. The contextual bandit problem [17] is an extension to the classic problem, where additional information is revealed sequentially. The decision-making can therefore be seen as a sequential game.

At decision step , i.e., at the -th segment, a learner observes a dimensional context variable for a set of actions . Here, the actions map to the video qualities that the client chooses from. The client chooses an action , for which it observes a reward . This reward can be measured in terms of low-level metrics such as fetching time or, as we consider later, QoE. The decision making is performed over a typically unknown decision horizon , i.e., . Therefore, the learner tries to maximize the cumulative reward until the decision horizon. It is important to note that after each decision the learner only observes the reward associated to the played action ; hypothetical rewards for other actions , are not revealed to the learner.

Next, we model the contextual bandit problem under the linearizability assumption, as introduced in [18]. Here, we assume that a parameter controls the mean reward of each action at decision step as . We introduce the regret of an algorithm to evaluate its performance as


with . The regret compares the cumulative reward of the algorithm against the cumulative reward with hindsight. In order to develop algorithms with a small regret in the linear setting, many different strategies have been proposed. Such algorithms include techniques based on forced sampling [19], Thompson sampling [20], and the upper confidence bound (UCB) [18, 7, 21, 22].

Network-assisted video streaming environments provide high-dimensional context information, so it is natural to assume a sparse parameter . We therefore impose a sparsity-inducing prior on the sought regression coefficients . To cope with the contextual bandit setting, we start with the Bayes-UCB algorithm with liner bandits introduced in [23]

and develop a version which fits the given problem. Since previously developed sparse Bayesian inference algorithms are computationally expensive, we develop a fast new inference scheme for the contextual bandit setting.

Iv-a The Contextual Bayes-UCB Algorithm - CBA

The Contextual Bayes-UCB algorithm (CBA-UCB) selects in each round the action which maximizes the index


where is a width parameter for the UCB and

is the quantile function associated with the distribution

, i.e., , with . Additionally, we denote as the posterior distribution of the mean reward


where is the set of data points of contexts and rewards for which action was previously played


In the following subsections, we derive a Gaussian distribution for the posterior distribution of the regression coefficients

. In this case the index in (2) reduces to


where the quantile function computes to , with the inverse error function . The algorithm for CBA-UCB is depicted in Fig. 2, Alg. 1.

Algorithm 1:  main-routine CBA-UCB with Gaussian Posterior
Input: (decision horizon), (UCB width parameter), , (initial parameters for all actions )
for  to  do
       observe contexts , for each action to  do
       end for
      play action observe reward Call a subroutine to update estimates for and
end for
Algorithm 2:  sub-routine SVI
Input: ( design matrix), (

response vector),

, , , (hyper-parameter), (step size schedule)
Output: , (updated parameters)
Initialize all natural parameters , , , , , (iteration step) while ELBO not converged do
       draw a random sample from calculate intermediate parameters with Eq. (15) do gradient update with Eq. (14) and step size update the variational parametrization with Eq. (16

) update the moments with Eq. (

end while
return ,
Algorithm 3:  sub-routine VB
Input: ( design matrix), ( response vector), , , , (hyper-parameter)
Output: , (updated parameters)
Initialize variational parameters with Eq. (11) while ELBO not converged do
       update the variational parameters with Eq. (10) update the variational moments with Eq. (11)
end while
return ,
Algorithm 4:  sub-routine OS-SVI
Input: (context vector for the last played action), (reward for the last played action), , , , (hyper-parameter), (step size), (current decision step)
Output: , (updated parameters)
calculate intermediate parameters with Eq. (15) and do gradient update with Eq. (14) and step size update variational parameters with Eq. (16) return ,
Fig. 2: The CBA-UCB Algorithm with three Bayesian inference schemes for the regression coefficients: Variational Bayesian Inference (VB), Stochastic Variational Inference (SVI) and One Step Stochastic Variational Inference (OS-SVI).

Iv-B Generative model of the linear rewards

Here, we derive the posterior inference for the regression coefficients . The posterior distributions are calculated for each of the actions. For the inference of the posterior (3), we use Bayesian regression to infer the posterior of the regression coefficients222For readability we drop the dependency on of the regression coefficients . We use the data , which is a set of previously observed contexts and rewards when taking action .

Assuming a linear regression model

with i.i.d. noise the regression response follows the likelihood

where is the noise precision for the regression problem. For the application of video streaming with high-dimensional context information, we use a sparsity inducing prior over the regression coefficients to find the most valuable context information. We use here the Three Parameter Beta Normal (TPBN) continuous shrinkage prior introduced by [11] , which puts on each regression coefficient , , the following hierarchical prior



is a Gamma distributed

333We use the shape and rate parametrization of the Gamma distribution. continuous shrinkage parameter that shrinks , as gets small. The parameter controls via a global shrinkage parameter parameter . For appropriate hyper-parameter choice of and different shrinkage prior are obtained. For example we use , which corresponds to the horseshoe prior [12]. For notational simplicity, we collect the parameters for the context dimensions in the column vectors , respectively.

For the estimation of the global shrinkage parameter an additional hierarchy is used as and . For the noise precision a gamma prior is used , with hyper parameters and . The graphical model [24] of this generative model is depicted in Fig. 3.



Fig. 3: Probabilistic graphical model for the Bayesian regression in Sect. IV-B with Three Parameter Beta Normal prior using factor graph notation. (Deterministic functions are depicted in diamond-shaped nodes and ’dot’ denotes the inner product.)

Iv-C Variational Bayesian Inference (VB)

In the following, we review the general approximate inference scheme of mean field variational Bayes (VB) and the application to the linear regression with TPBN prior as proposed in [11]. Thereafter, we leverage stochastic variational inference (SVI) to develop a new contextual bandit algorithm.

Since exact inference of the posterior distribution is intractable [25], we apply approximate inference in form of variational Bayes (VB) for posterior inference. We use a mean field variational approximation, with for the approximate distribution. The variational distributions are obtained by minimizing the Kullback-Leibler (KL) divergence between the variational distribution and the intractable posterior distribution


By Jensen’s inequality, a lower bound on the marginal likelihood (evidence) can be found


The evidence lower bound (ELBO) is used for solving the optimization problem over the KL divergence (7), since maximizing is equivalent to minimizing the KL divergence. Using calculus of variations [25], the solution of the optimization problem can be found with the following optimal variational distributions444 denotes the generalized inverse Gaussian distribution, see Appendix A.


with the parameters of the variational distributions


and the moments


where is the modified Bessel function of second kind. The calculation of the ELBO is provided in Appendix B. Fig 4

shows the probabilistic graphical model of the mean field approximation for the generative model. Note the factorization of the random variables which enables tractable posterior inference in comparison to the probabilistic graphical model for the coupled Bayesian regression in Fig. 



Fig. 4: Probabilistic graphical model using a mean field approximation for the Bayesian regression (see Sect. IV-C).

A local optimum of the ELBO can be found by cycling through the coupled moments of the variational distributions. This corresponds to a coordinate ascent algorithm on . The corresponding algorithm is shown in Fig. 2 Alg. 3.

Iv-D Stochastic Variational Inference (SVI)

Next, we present a new posterior inference scheme with TPBN prior based on stochastic variational inference (SVI) [26]. We optimize the ELBO by the use of stochastic approximation [27] where we calculate the natural gradient that is obtained with respect to the natural parameters of the exponential family distribution of the mean field variational distributions.

Consider the mean field approximation for the intractable posterior distribution , where and denote the tuple of parameters and the data, respectively. For each factor

, assuming it belongs to the exponential family, the probability density is

Here, denotes the base measure, are the natural parameters, is the sufficient statistics of the natural parameters, and is the log-normalizer.

We compute the natural gradient of the ELBO with respect to the natural parameters of the factorized variational distributions for each variational factor . Therefore, the natural gradient computes to


where . The parameter is the natural parameter of the full conditional distribution , where denotes the tuple of all variables but . Using a gradient update the variational approximation can be found as


where denotes the iteration step of the algorithm and is a step size parameter.

Random subsampling of the data enables constructing a stochastic approximation algorithm. For this,

is replaced by an unbiased estimate

, which yields a stochastic gradient ascent algorithm on the ELBO in the form


For the step size we use . In the case of the regression problem, we sample one data point from the set of observed data points and replicate it times to calculate . The intermediate estimates of the natural parameters are then obtained by


The derivation is provided in Appendix C.

The transformation from the natural parametrization to the variational parametrization is calculated using


and the moments can then be calculated with (11). We denote by the -th variable of the tuple of natural parameters . The gradient update (14) with random subsampling is performed until the ELBO converges. For an algorithmic description of SVI see Fig. 2 Alg. 2.

Fig. 5: Average regret for our contextual bandit algorithms vs. the baseline (CGP and LinUCB) for a dense linear model.

Iv-E One Step Stochastic Variational Inference (OS-SVI)

Since the optimization of the ELBO until convergence with both VB and SVI is computationally expensive, we present a novel one-step SVI (OS-SVI) algorithm for the bandit setting. In each round of OS-SVI the learner observes a context and a reward based on the taken action . This data point is used to update the variational parameters of the -th regression coefficients by going one step in the direction of the natural gradient of the ELBO . For this we calculate the intermediate estimates (15) based on replicates of the observed data point . Thereafter, the stochastic gradient update is performed with (14). By transforming the natural parameters back to their corresponding parametric form (16), the updated mean and covariance matrix can be found. This update step is computationally significantly faster than using VB or SVI. The OS-SVI subroutine is described in Fig. 2 Alg. 4.

Iv-F Accuracy and Computational Speed of the CBA-UCB Algorithms

For the numerical evaluation of the CBA-UCB with three parameter Beta Normal prior, we first create data based on the linearization assumption. We use a problem with decision horizon , dimensions, and actions. We use two experimental setups with a dense regression coefficient vector and a sparse regression coefficient vector, i.e., only five regression coefficients are unequal to zero.

We compare the developed algorithm CBA-UCB using the variants VB, SVI and OS-SVI with two base-line algorithms: LinUCB [7] and CGP-UCB [22]. For the CGP-UCB, we use independent linear kernels for every action. Fig. 5 and Fig. 6 show the average regret (1) for the dense and the sparse setting, respectively. For the sparse setting expected in high-dimensional problems such as network-assisted video streaming, CBA-UCB with VB yields the smallest regret. We observe in Fig. 5 that in the dense setting CGP-UCB obtains a high performance which is closely followed by CBA-UCB with VB. Note that CGP-UCB performs well, as Gaussian process regression with a linear kernel corresponds to a dense Bayesian regression with marginalized regression coefficients, and therefore matches the model under which the dense data has been created.

In Fig 1 we show the run-times of the algorithms, where we observe that the run-times for CBA-UCB with VB / SVI and the CGP-UCB baseline are impractically high. Further, this running performance deteriorates as the dimensions of the context grow, since the computational bottleneck of both VB and SVI are multiple matrix inversions of size , see Fig. 7. Fig. 8 shows the scaling of the run-time with the decision horizon with an identical setup as in Tab. 1. The CGP-UCB scales badly with , as the kernel matrix of size is inverted at every time step. Here, denotes the number of already observed contexts and rewards for decision . Since the decision making has to be made in the order of a few hundred milliseconds for video streaming applications, neither CBA-UCB with VB nor CGP-UCB can be computed within this timing restriction. Therefore, we resort to the OS-SVI variant of the CBA algorithm, which empirically obtains a much smaller regret than the fast LinUCB baseline algorithm, but still retains a comparable run-time performance555For updating CBA-UCB with OS-SVI or LinUCB we only have to invert a matrix once after a decision.. This renders the use of CBA with One Step Stochastic Variational Inference for network-assisted video quality adaptation feasible.

Fig. 6: Average regret for our contextual bandit algorithms vs. the baseline (CGP and LinUCB) for a sparse linear model.
Algorithm Sparse Setting Dense Setting
CGP-UCB 638.68 s 643.44 s
LinUCB 31.24 s 30.70 s
CBA-OS-SVI 91.40 s 89.56 s
CBA-SVI 3784.00 s 4081.74 s
CBA-VB 1434.11 s 1760.83 s
Tab. 1: Run-times for simulations of the CBA algorithms compared to the baseline algorithms CGP-UCB and LinUCB. Simulations executed on an Intel® Xeon® E5-2680 v3 @2.5GHz machine.
Fig. 7: Run-time vs. context dimensions for a sparse linear model, with actions and a decision horizon of . CBA-UCB with SVI not shown for clarity.
Fig. 8: Run-time vs. decision horizon for a sparse linear model, with actions and features. CBA-UCB with SVI not shown for clarity.

V Video Quality Adaptation as a Contextual Bandit Problem

In the following, we model ABR streaming as a contextual bandit problem where we use our developed CBA algorithm for video quality adaptation. The action set corresponds to the set of available bitrates such that action represents the decision to request quality for the -th segment; i.e., to request the segment . Below we formulate a real-valued segment-based QoE function to represent the reward obtained by performing . Furthermore, we let represent the network context vector corresponding to an available action at segment . At each , therefore, there will be unique context vectors available.

V-a Online Video Quality Adaptation using CBA

CBA performs online video quality adaptation by calculating the index presented in (5) for each available action after observing the context vector of the action to determine the optimal bitrate to request for the next segment . There are no constraints on the contents of the context vectors, allowing CBA to learn with any information available in the networking environment. Furthermore, each context feature may be either global or action-specific; for example, the current buffer filling percentage or the last 50 packet RTTs at bitrate , respectively. The action with the largest computed index is chosen, and a request goes out for . Once is received, its QoE value below is calculated and fed to CBA as the reward . CBA then updates its internal parameters before observing the next set of context vectors and repeating the process for segment , until the video ends at segment .

The performance of CBA depends upon several hyperparameters. In the description in Fig. 

2, Alg. 1., we choose as it was shown to yield the most promising results [23]. As mentioned in Sect IV, we use to obtain the horseshoe shrinkage prior. We let ; we choose and to be small nonzero values such that a vague prior is obtained.

V-B Reward Formulation: Objective QoE

The calculated QoE metric is the feedback used by CBA to optimize the quality adaptation strategy. As QoE scores for a video segment may vary among users, we resort in this work to an objective QoE metric similar to [28] which is derived from the following set of factors:

  1. Video quality: The bitrate of the segment. .

  2. Decline in quality: If the current segment is at a lower bitrate than the previous one, for two back to back segments666we use to denote ..

  3. Rebuffer time: The amount of time spent with an empty buffer after choosing .

The rationale behind using the decline in quality, in contrast to the related work that counts quality variations, is that we do not want to penalize CBA if the player strives for higher qualities without risk of rebuffering. The importance of each component may vary based on the specific user or context, so, similar to [28], we define the QoE of a segment as a weighted sum of the above factors. Let the rebuffer time be the amount of time spent rebuffering after choosing . We define the QoE then as:


where , , and are non-negative weights corresponding to the importance of the video quality, decline in quality, and rebuffer time, respectively. For a comparison of several instantiations of these weights, see [28].

Note that the above QoE metric is independent from CBA; the bandit is only given the scalar result of the calculation. CBA is able take arbitrary QoE metrics as specified input as long as these comprise a real-valued function to produce the reward metric.

Vi Evaluation of Quality Adaptation in NDN

Fig. 9: Emulation testbed for the doubles topology. Client link capacity follows a bandwidth trace, server links have a capacity of 20Mbps, and the internal cache link has a capacity of 1000Mbps. Caches can store up to 1500 Data chunks.
Fig. 10: Emulation testbed for the full topology. Client and server links have a capacity of 20Mbps, and the internal cache links have a capacity of 1000Mbps. Caches can store up to 1500 Data chunks.

To evaluate the performance of CBA and compare it with Throughput-based (TBA) and Buffer-based (BBA) adaptation peers, we emulate the two NDN topologies: the doubles topology, shown in Fig. 9; and the full topology, shown in Fig. 10. The topologies are built using an extension of the Containernet project777 which allows the execution of Docker hosts as nodes in the Mininet emulator.

The NDN clients use a DASH player implemented with libdash, based on the code from [2] with Interest Control Protocol (ICP) parameters of , , and . We note that traffic burstiness can vary significantly depending on the ICP parameters used.

The clients begin playback simultaneously, where they stream the first 200 seconds of the BigBuckBunny video encoded in two-second H.264-AVC segments offered at the quality bitrates {1, 1.5, 2.1, 3, 3.5}Mbps, with a playback buffer size of 30 seconds. All containers run instances of the NDN Forwarding Daemon (NFD) with the access strategy, and repo-ng is used to host the video on the servers and caches.

Fig. 11: Client 1 QoE over playback on the doubles topology.
Fig. 12: QoE fairness evaluation on the doubles topology.

In the following, we compare the performance of CBA in the VB and OS-SVI variants, in addition to the baseline algorithm LinUCB [7]. We also examine the performance of two state-of-the-art BBA and TBA algorithms, i.e., BOLA [15] and PANDA [14], respectively. There are many adaptation algorithms in the literature, some of which use BBA and TBA simultaneously, including [28], [29], [30], and [31]; however, BOLA and PANDA were chosen because they are widely used and achieve state-of-the-art performance in standard HTTP environments. Buffer filling percentage and quality-specific segment packet RTTs are provided to the client as context. Furthermore, we added a numHops tag to each Data packet to track the number of hops from the Data origin to the consumer.

We track the RTTs and number of hops of the last 50 packets of each segment received by the client in accordance with measurements from [32]. If a segment does not contain 50 packets, results from existing packets are resampled. As a result, each CBA algorithm is given a dimensional context vector constituted of the buffer fill percentage, packet RTTs, and numHops for each of the available qualities.

Bitrate [Mbps]
Quality switches [#]
Switch magnitude [Mbps]
Parameter update time [ms]
CBA-OS-SVI 3.10 6 0.57 15
CBA-VB 2.58 6 0.65 325
LinUCB 2.24 14 1.07 6
BOLA 2.63 36 1.19
PANDA 2.51 16 1.00
Tab. 1: Client 1 streaming statistics on the doubles topology.

Vi-a Results on the Doubles Topology

We modulate the capacity of the bottleneck link using truncated normal distributions. The link capacity is hence drawn with mean of 7Mbps, where it stays unchanged for a period length drawn with a mean of 5s. The weights in Eq. 

17 are set to , , and , emphasizing the importance of the average quality bitrate without allowing a large amount of rebuffering to take place. We note that the use of subjective quality evaluation tests for different users to map these weights to QoE metrics via, e.g., the mean opinion score (MOS), is out of the scope of this work.

Examining Tab. 1, we see that the one-step CBA-OS-SVI yields a significantly higher average bitrate. This is expected based on the QoE definition (17), but we might expect CBA-VB to pick high bitrates as well. However, we observe that the parameter update time for CBA-VB is 20 times greater than that of CBA-OS-SVI; this puts a delay of one-sixth of each segment length on average between receiving one segment and requesting another. Looking at CBA-VB in Fig. 11 we see that CBA-VB accumulates a much larger rebuffer time than other methods. Hence, CBA-VB is forced to request lower bitrates to cope with the extra rebuffer time incurred by updating its parameters. In addition, note that LinUCB fails to select high bitrates despite having a very small parameter update time, implying that LinUCB is not adequately fitting the context to the QoE and is instead accumulating a large amount of regret. This is corroborated by its cumulative QoE depicted in Fig. 11, which performs nearly as poorly as CBA-VB. By inducing sparsity on the priors and using just one sample, CBA-OS-SVI successfully extracts the most salient features quickly enough to obtain the highest cumulative QoE of all algorithms tested.

Interestingly, the CBA approaches shown in Fig. 1 also result in the lowest number of quality switches, though our QoE metric does not severely penalize quality variation. We see that the magnitude of their quality switches is also nearly half that of the other algorithms.

Concerning the rebuffering behavior, we observe rebuffering ratios of {4.5%, 8.4%, 11.4%, 17.6%, 32.9%} for LinUCB, BOLA, PANDA, CBA-OS-SVI, and CBA-VB, respectively. We trace some of the rebuffering events to the ICP congestion control in NDN. Note that tuning the impact of rebuffering on the adaptation decision is not a trivial task [2]. Fortunately, this is not hardwired in CBA but rather given through (17). Hence, in contrast to state-of-the-art adaptation algorithms, CBA could learn to filter the contextual information that is most important for rebuffering by tweaking the QoE metric used.

An important consideration when choosing a quality adaptation algorithm is fairness among clients while simultaneously streaming over common links. While this is taken care of in DASH by the underlying TCP congestion control, we empirically show here how the ON-OFF segment request behavior, when paired with the considered quality adaptation algorithms, impacts the QoE fairness in NDN. This is fundamentally different from considering bandwidth sharing fairness in NDN; e.g., in [2]. Here we are interested in QoE fairness since the QoE metric and not the bandwidth share is the main driver of the quality adaptation algorithm. Fig. 12 shows the regret of QoE fairness between both clients , where a larger regret indicates a greater difference in QoE between both clients up to a particular segment. Note that the regret is defined as a cumulative metric similar to (1). In accordance to the discussion in[33], the fairness measure used here is the entropy of the relative QoE of the two clients where denotes the binary entropy and the QoE is given by (17). The regret is calculated with respect to the optimal fairness of . Observe that the CBA algorithms attain a significantly lower QoE fairness regret than other techniques.

Vi-B Results on the Full Topology

To evaluate the capacity of CBA to adapt to different reward functions in complex environments, we compare performance with the full topology on two sets of weights in Eq. 17: HIGH_QUALITY_WEIGHTS sets , , and , identical to those used in the evaluation on the doubles topology; conversely, NO_REBUFFERING_WEIGHTS sets , , and

, placing greater importance on continuous playback at the expense of video quality. We evaluate each algorithm with each weighting scheme for 30 epochs, where one epoch corresponds to streaming 200 seconds of the BigBuckBunny video. All clients use the same adaptation algorithm and weighting scheme within an epoch, and bandits begin each epoch with no previous context information.

Inspecting Tab. 2, we observe that the performance statistics among algorithms, even with different weighting schemes, are much closer than for the doubles topology. We attribute this to the use of a more complicated topology in which many more clients are sharing network resources, resulting in fewer and less predictable resources for each client. Furthermore, the average bitrate for the bandit algorithms does not change significantly across weighting schemes, and either stays the same or increases when using NO_REBUFFERING_WEIGHTS. This may seem contradictory, but, analyzing part (a) of Figs. 13 and  14, we note that CBA-OS-SVI tended to choose much lower bitrates with NO_REBUFFERING_WEIGHTS, and therefore accruing less rebuffer time in part (b), than with HIGH_QUALITY_WEIGHTS, indicating that CBA-OS-SVI successfully adapted to either weighting scheme within the playback window. Similarly to the doubles topology, LinUCB failed to map the context to either weighting scheme, selecting higher bitrates and rebuffering longer with NO_REBUFFERING_WEIGHTS. Note that, for either CBA-OS-SVI or LinUCB, the cumulative rebuffer time in part (b) of Figs. 13 and  14 tapers off roughly halfway through the video, as either algorithm learns to request more appropriate bitrates.

Interestingly, CBA-VB also fails to adapt to either weighting scheme, performing nearly identically in either case. This is a byproduct of the excessive parameter update time for CBA-VB in Tab. 2, which stems from the unpredictable nature of a larger network and the computational strain of performing up to 7 CBA-VB parameter updates simultaneously on the test machine. CBA-VB is therefore spending over half of the length of each segment deciding on which segment to request next, causing long rebuffering times in part (b) of Figs. 13 and  14, culminating in very low QoE scores regardless of the weighting scheme used. This obfuscates the underlying QoE function, preventing CBA-VB from differentiating between the weights in either case within the time allotted. In a real-world scenario, where each client is an independent machine, we expect that CBA-VB, as well as CBA-OS-SVI and LinUCB to a lesser extent, would have parameter update times comparable to those in the doubles topology, resulting in better performance; however, we note that evaluation in such an environment is out of the scope of this work.

Again, we see in Tab. 1 that CBA-OS-SVI switches qualities least frequently despite neither weighting scheme explicitly penalizing quality variation. Furthermore, according to parts (c) and (d) of Fig. 13 and Fig. 14, CBA-OS-SVI and CBA-VB are both stable in the number of quality switches and the quality switch magnitude across epochs, even under different weighting schemes, as opposed to the other algorithms tested.

Bitrate [Mbps]
Quality switches [#]
Switch magnitude [Mbps]
Parameter update time [ms]
CBA-OS-SVI 1.55 5 0.82 53
CBA-VB 1.52 15 1.16 1254
LinUCB 1.27 17 1.01 11
BOLA 1.96 8 0.63
PANDA 1.15 18 0.56
CBA-OS-SVI 1.55 6 0.93 55
CBA-VB 1.68 12 1.08 1362
LinUCB 1.43 22 1.04 16
BOLA 1.92 12 0.71
PANDA 1.13 17 0.70
Tab. 2: Client 1 streaming statistics on the full topology.
(a) CCDF of average bitrate chosen per epoch.
(b) Average cumulative rebuffer time during playback.
(c) CCDF of the number of quality switches per epoch.
(d) CCDF of the average magnitude of quality switches per epoch.
Fig. 13: Results for full topology with HIGH_QUALITY_WEIGHTS
(a) CCDF of average bitrate chosen per epoch.
(b) Average cumulative rebuffer time during playback.
(c) CCDF of the number of quality switches per epoch.
(d) CCDF of the average magnitude of quality switches per epoch.
Fig. 14: Results for the full topology with NO_REBUFFERING_WEIGHTS

Vii Conclusions and Future Work

In this paper, we contributed a sparse Bayesian contextual bandit algorithm for quality adaptation in adaptive video streaming, denoted CBA. In contrast to state-of-the-art adaptation algorithms, we take high-dimensional video streaming context information and enforce sparsity to shrink the impact of unimportant features. In this setting, streaming context information includes client-measured variables, such as throughput and buffer filling, as well as, network assistance information. Since sparse Bayesian estimation is computationally expensive, we developed a fast new inference scheme to support online video quality adaptation. Furthermore, the provided algorithm is naturally applicable to different adaptive video streaming settings such as DASH over NDN. Finally, we provided NDN emulation results showing that CBA yields higher QoE and better QoE fairness between simultaneous streaming sessions compared to throughput- and buffer-based video quality adaptation algorithms.

Appendix A The Generalized Inverse Gaussian

The probability density function of a generalized inverse Gaussian (GIG) distribution is


The GIG distribution with parameters is a member of the exponential family distribution with base measure , natural parameters , sufficient statistics and log-normalizer . The inverse transform of the natural parameters is obtained by .

Appendix B Calculation of the ELBO

Here, we present the calculation for the ELBO. The joint distributions involved in the calculation of the evidence lower bound (

8) factorize as




Denoting as the expactation w.r.t. to the distribution , the evidence lower bound (8) is


The expected values of the log factorized joint distribution (19) needed for (21) are


The expected values of the log factorized variational distribution (20) compute to