Sunstar: A Cost-effective Multi-Server Solution for Reliable Video Delivery

12/01/2018 ∙ by Behnaz Arzani, et al. ∙ 0

In spite of much progress and many advances, cost-effective, high-quality video delivery over the internet remains elusive. To address this ongoing challenge, we propose Sunstar, a solution that leverages simultaneous downloads from multiple servers to preserve video quality. The novelty in Sunstar is not so much in its use of multiple servers but in the design of a schedule that balances improvements in video quality against increases in (peering) costs. The paper's main contributions are in elucidating the impact on the cost of various approaches towards improving video quality, including the use of multiple servers, and incorporating this understanding into the design of a scheduler capable of realizing an efficient trade-off. Our results show that Sunstar's scheduling algorithm can significantly improve performance (up to 50 in some instances) without cost impacts.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The growing importance of video traffic is by now well-documented, with the share of video traffic on North American networks exceeding of peak hour traffic in early 2016 and expected to surpass by the end of 2020 [sandvine16]. And while its dominance is not as strong on mobile access links where it now represents about of peak traffic, it is a major factor there as well. Furthermore, this growth appears unimpeded by continued progress in codec efficiency, e.g., x265 or VP9 codecs [kufa15, uhrina14, ozer16].

In the face of such a trend, or maybe because of it, there are, however, persistent problems when it comes to ensuring quality video delivery. For example, Conviva VXR reports, e.g., [vxr14, vxr15, conviva15], which each year track various video performance metrics, report that buffering events (periods during which the video stalls to replenish its playback buffer) remain common (from to depending on location), as do drop in video resolution while playing (those affect over of all videos). Equally important, degradation in video quality is also known to have a major impact on users’ behavior and their satisfaction with Internet video [krishnan12]. This is obviously of concern, as articulated in several recent industry forums focused on video delivery [cdnsummit13, cdnsummit14, cdnsummit15, cdnsummit16].

The first goal of this paper is, therefore, to explore possible solutions to remedy this situation and improve the quality of Internet video delivery. Of interest in this context is the use of multipath solutions, and in particular solutions that let video clients download video segments (chunks) simultaneously from multiple servers. We refer to such solutions as Multipath-MultiServer (MuMS), e.g., [dynamic, youtuber]. Reliance on paths from multiple distinct servers can help mitigate exposure to quality degradations due to congestion or failure of individual paths, and server overload. In particular, multipath has been shown useful in improving throughput and reliability in both wired and wireless networks [apostolopoulos04, Chen:2004ih, chen09, ganesan2001highly, golubchik02, Radi:2012en], and there is initial evidence that it could also benefit delay [javed09] as well as rate stability [arzani12]. The latter is of particular interest, as rate variations are a major factor in video quality degradation.

Specifically, a typical video streaming client operates in two phases [rao11]: pre-buffering and re-buffering. A video is split into segments (chunks), and in the pre-buffering phase the client requests chunks at the maximum rate allowed by the network111This holds for both video download and live-streaming, though with obvious limitations on the pre-buffering phase for the latter.. Once there are enough chunks in the client’s playback buffer, it starts playing the video and switches to re-buffering mode. In this mode, the client requests chunks at a fixed rate determined by the encoding rate. Video playback proceeds smoothly as long as chunks are in the playback buffer before they need to be played-out. Variations in transmission capacity between the server(s) and the client can result in late delivery or loss of video chunks. This, in turn, depletes the playback buffer and eventually induces video stalls or skips.

A popular approach for dealing with variations in transmission capacity is adaptive bit rates (ABR) [abr96, abr16] – sometimes also called HTTP Adaptive Streaming (HAS). Under ABR/HAS, the client changes its encoding bit rate to match the capacity available in the network and avoid losses/delays. However, rate changes remain visible to the viewer and still translate in degraded quality of experience (QoE) [sogaard16, garcia14, seufert14], albeit at a lesser level. As a result, it remains desirable to devise solutions that eliminate or mitigate variations in transmission rate, and preserve video quality without having to resort to (coding) rate adjustments. This is one of the goals of our solution, Sunstar, which seeks to leverage multiple paths to different servers to maintain a stable transmission rate even in the presence of network variations (on individual paths). In this respect, Sunstar is complementary to ABR-based solutions.

Another goal of Sunstar is to realize its goal of mitigating rate variations without impacting peering costs, i.e., the costs video providers incur from Internet Service Providers (ISPs) for delivering video to their customers. Ensuring that better quality in video delivery does not translate into higher peering costs is of particular importance given the low profit margins under which most video providers operate [ball15, bonte10]. Of concern in our context is the extent to which reliance on multiple paths might increase peering costs. The possibility of such increases is, in hindsight, intuitive given the non-linear nature of most ISPs’ charging model, i.e., most charge based on the percentile of usage in  mins intervals over a period of a month222See Spreading transmissions over multiple paths means that a separate percentile is now computed on each path, which, as discussed in Section III, can result in a higher overall cost value.

In summary, Sunstar’s main contributions are as follows:

A cost-neutral solution to improving video delivery. We develop a principled understanding of how different mechanisms for improving video delivery, including multipaths, contribute to higher (peering) costs. We use this understanding to develop a scheduler capable of delivering significant performance improvements with little to no impact on cost.

A video client improving users’ QoE. We implement a Sunstar video client in user space, and demonstrate its benefits by quantifying its ability to minimize video stalls/skips across a broad range of network impairments. Our Emulab [emulab] results indicate that Sunstar can improve video performance quality by up to or more in various settings.

Ii Related Works

There has obviously been much work on optimizing video transmission and using multipaths to overcome network impairments. Our intent is not to provide an exhaustive review, but rather to summarize major approaches and highlight similarities and differences with this work.

Ii-a Video Delivery Optimizations

Adaptive bit rate (ABR) [abr96, abr16] is, as mentioned earlier, a powerful approach for mitigating the impact of network rate variations by allowing clients to correspondingly adjust their video coding rate. The main drawback of ABR is that it requires servers and caches to store multiple encodings of the same video, or codecs to be able to dynamically update their coding rate. In addition, adjustments in coding rates still produce noticeable changes in video quality [sogaard16, garcia14, seufert14]. Our goal with Sunstar is complementary to ABR, in that we aim to leverage multiple paths to different servers to minimize (network) rate variations, and therefore coding rate changes.

Caching is another popular strategy. It improves performance by moving files as close as possible to clients through caches located at the network edge. This is, however, not always effective, in part because copyrights laws make much content un-cachable, and the combination of the long tail of video popularity [longtail] and the use of ABR can lower cache efficiency. Consequently, even smart caching algorithms only boast a cache efficiency of about  [qwiltreport] (for ABR videos). Sunstar is meant to improve video delivery in instances when video cannot be served from a local edge cache.

OpenConnect [openconnect] was proposed by Netflix. It relies on embedding appliances in ISPs’ networks to locate content closer to clients and to preemptively populate caches at off-peak hours to avoid cache warm-up and network congestion during peak hours. It calls for partnership between content providers and ISPs, e.g., locating appliances in the ISP’s facility, which some large ISPs are reluctant to engage in as they have competitive businesses [netflixdeal].

Content filtering limits content available to users to content that can be delivered with high quality, e.g., from caches. This is realized by applying filters that limit viewable listings to a subset of (popular) videos, or by steering users away from unpopular items [UCG]. Both approaches result in potential loss of revenue, e.g., removing such filters can increase the number of views by  [UCG].

Dynamic CDN switching is offered by companies such as Conviva and Cedexis which act as brokers in the CDN domain. They measure CDN status and switch between CDN providers based on performance. The main disadvantage is that CDNs no longer control how clients are redirected to servers, which can have unintended consequences, including higher costs [broker]. In contrast, Sunstar keeps the assignment of clients to servers under the video provider’s purview and incorporates mitigating cost increase as an explicit criterion.

Hybrid CDN-P2P seeks to combine the best of CDN and peer-to-peer solutions [cdnp2p]. Netsession [netsession] offers a representative example of the potential benefits of such an approach. Unlike Sunstar, it again does not offer an explicit control on how improving performance affects a provider’s cost. Additionally, aspects such as copyright management are traditionally difficult to handle in a P2P setting.

Ii-B Multipath Solutions

The benefits of multipaths have been studied in numerous settings, e.g., [conga, MRTP, mmsys], but perhaps most visible among them are studies of Multipath TCP (MPTCP) [nsdi2012], whose investigations related to congestion control [conj1, conj2, analdesign, arzani14] or scheduling [pams14, mptcpscheduler14] are of most relevance, even if not directly applicable because of MPTCP’s assumption of a single source and a single destination. Nevertheless, several techniques developed to improve MPTCP can be repurposed in a MuMS setting. For example, as discussed in Section IV, Sunstar is able to leverage MPTCP’s opportunistic retransmit to improve its performance.

More directly comparable to Sunstar are works that explicitly target improving video delivery by relying on multiple servers. In most such settings, e.g., [dynamic, youtuber], the focus has, however, been on optimizing download rates, which, as we shall see, can have a significant impact on cost. Specifically and as discussed in Section III, while the more aggressive download strategies of [dynamic, youtuber]

can reduce the odds of skips and stalls, they typically result in higher costs. In contrast, Sunstar aims to realize comparable improvements in video quality, but with little to no increases in cost.

Ii-C Server selection and cost optimizations

Another relevant body of work is that of server selection algorithms that optimize for a given metric, e.g., performance or cost [centralized, donar, donar2, ishai]. Extending those approaches to a MuMS’ setting is challenging as the multipath nature of MuMS clients makes predicting variations in traffic volumes at peering links more difficult than with single path clients. In particular, clients are now free to choose how to distribute video requests across paths. How this impacts cost and performance adds a new non-trivial dimension to the problem.

In this context, the approach closest to Sunstar is [multihoming2]. It considers both performance and cost and adopts a cost minimization formulation with performance as a constraint, where for each CDN performance is based on long-term QoE measurements from clients in different regions. Given an expected request load, [multihoming2]

computes a “prioritized” list of servers that a client should use when requesting content. Higher priority servers are to be used first as long as they have available capacity. A TCP-like AIMD mechanism is used to estimate the bandwidth available to each server. Sunstar’s approach differs from that of 

[multihoming2] in that rather than minimizing cost and keeping performance as a constraint, it leverages its understanding of the relationship between cost and performance to select rate variation as its minimization target. In addition, Sunstar’s scheduler offers a more responsive mechanism than that of [multihoming2], which is limited to the set of servers computed by its optimization. In some sense, the scheduler of [multihoming2] is similar to the min-RTT scheduler of Section V, which, as we shall see, performs significantly worse than that of Sunstar in terms of both cost and performance.

Finally, of note in the context of cost optimization is [shapley], which attempts to account for the contributions of individual users to a  percentile cost function. Although such an approach could be used to formulate an appropriate objective function for a server selection algorithm, it requires detailed knowledge of the exact traffic patterns of each user. This is unlikely to be feasible, especially in a multipath setting where variations on a given path affect traffic on all paths.

Iii MuMS Benefits and Implications

Before presenting Sunstar, we first motivate a MuMS approach by showing the type of performance improvements achievable when downloading from multiple servers. Next, we offer insight into the relationship that exists between performance and (peering) cost, and in particular why reliance on multiple paths, as in MuMS, can result in higher costs. This illustrates the need for an approach that balances performance and cost, and offers a possible direction for Sunstar’s design towards realizing such a goal.

Iii-a Performance Benefits

A MuMS solution should improve users’ QoE as multiple servers (and the paths from those servers) are unlikely to simultaneously experience congestion or failures. To assess the significance of those gains, we compare the performance of a MuMS client to that of a single-server client in an Emulab experiment. To simplify our setup, in all cases clients use only a single path to each server.

The Emulab connection between the client and each server consists of two links separated by a shaping node with a buffer size of 50 packets. To create an environment that exercises bandwidth limitations, the available bandwidth to each server from the client was set to an average value equal to its download rate (as determined by the video player), but with variations between and . This was achieved using dummynet [dummynet] on the shaping node. Each client repeatedly downloads a large video ( minutes or more) and is assigned to a fixed set of servers where is either , , or . In all cases, video download proceeds by issuing http get requests at a rate commensurate with that of the video. In the single-server case, TCP controls the actual download rate. In the multi-server case, a standard TCP-like application-level congestion control mechanism determines the available rate from each server, and when multiple servers are available, the lowest RTT server is selected. This represents a relatively basic “scheduler,” which nevertheless serves the purpose of demonstrating the benefits of a MuMS solution.

In evaluating performance, we focus on two metrics of importance to video QoE, namely, stalls and skips. Clients stall whenever their playback buffer runs empty or the next chunk to playback is missing. Skips occur, usually after a stall, when the player decides to skip a missing chunk and resumes playback using a latter chunk. Previous studies [QoE1, QoE2, QoE3] have verified the correlation between user satisfaction and these metrics. We further verify their impact on user experience through a Mechanical Turk experiment described in Appendix -A. Note that there is an inherent trade-off between the two metrics. A short time-out for chunks increases skips but minimizes stalls and vice versa.

Fig. 1: Average stall duration across clients.
Fig. 2: Fraction of total download time each client was stalled (across clients).

Figures 1 and 2 report the distributions (across clients) of the average stall time and the average fraction of time clients are stalled for configurations with , and servers, respectively. Figure 3

focuses on the distribution of the number of skipped video chunks for the same configurations. The confidence intervals in the figure show the

percent confidence interval assuming a binomial distribution for data in each bin. The figures establish the benefits of a MuMS solution, which reduces

both stalls duration and the number of skips. For example, more than of -server clients saw no stalls, while the number was for -server clients, and for single-server clients; and those benefits persist among clients that experienced longer stalls on average. Similarly, over of -server (-server) clients did not experience any skips, while this number drops to below for single-server clients, with again the benefits of a MuMS solution extending to the tail of the distribution.

Fig. 3: Distribution of the number of skipped chunks.

In Section V, we show how Sunstar’s more sophisticated scheduler can yield even further improvements.

Iii-B Cost-Performance Trade-offs

There are multiple options to improve video delivery. A simple approach is to download the video at the highest possible rate from one or more servers, as it should minimize the odds of a chunk arriving late and/or the playback buffer running empty333Note though that this comes at the expense of larger playback buffers at the clients, and a potential waste of bandwidth when clients abandon watching the video halfway.. This is why works such as [youtuber] focus on maximizing clients’ throughput during their “on” period (the time during which the client fills its playback buffer).

However, while performance benefits are intuitive, it is unclear how such a scheme affects cost. On one hand, a strategy that maximizes throughput has clients leaving the system earlier, which can reduce bandwidth usage when computed over minute intervals. On the other hand, the higher download rates while clients are present can increase bandwidth usage. How these two opposing factors contribute to -percentile costs is not obvious at first sight.

To gain a better understanding of this trade-off, we develop a simple analytical model to evaluate the impact of the download rate on peering costs. We consider a scenario where: (1) clients connect to a fixed set of servers with distinct peering links for each server; (2) clients download a video of size  at a constant aggregate rate of and distribute download requests equally across servers (the download rate for each server is ); (3) clients arrive according to a Poisson process of rate ; and (4) peering costs follow a -percentile model ( in a typical scenario).

Theorem 1 establishes that the higher the download rate , the higher the peering cost. In other words, while a more aggressive download strategy may improve performance, it results in higher costs.

Theorem 1.

Given clients arriving according to a Poisson process and downloading equally from  servers at an aggregate rate of , the -percentile peering cost is an increasing function of .


Assuming a properly provisioned system, i.e., non-blocking, each server plus peering link combination behaves as an

system whose occupancy probability is given by

, where . Assuming is large, i.e., we are dealing with large systems,

can be approximated by a normal distribution with mean and variance equal to


The -percentile occupancy of the system (each client is assigned one “server” with a service/download rate of ) is then of the form

where is the CDF of the normal distribution. This implies where is constant, e.g., for , . Thus, , and the -percentile traffic volume on the corresponding peering link is . Hence, under a -percentile cost model, the peering cost for the system is , an increasing function of . ∎

Although Theorem 1 relies on a number of simplifying assumptions, it nevertheless captures the key factor that while increasing download rates allows clients to leave the system faster its overall impact on cost is negative. In other words, downloading at the lowest possible rate that meets the video requirements yields the lowest cost. As described in Section IV, we leverage this insight in designing the Sunstar scheduler.

Iii-C Impact of MuMS on Cost

The previous section established that a greedy/aggressive download strategy had a negative impact on cost. In this section, we show that the multiple paths of a MuMS’ solution have a similar effect. For that purpose and without loss of generality, we consider a system with a single server reachable by clients over either one or two peering links. When two peering links are available, clients split their traffic across the two links according to some strategy. Under the two peering links configuration, we denote as and

the vectors of traffic volumes recorded in

 minute intervals on links -, respectively. Correspondingly, traffic on the peering link of the single server configuration is denoted as . Ignoring the impact of short time-scale traffic fluctuations, we have , and therefore . Hence, under a peak rate charging model, a single path solution yields a lower cost.

There are obviously additional factors at play when considering the multiple servers of MuMS clients and a percentile rather than peak charging model. Nevertheless, this captures a fundamental aspect of multipath solutions and their impact on many non-linear cost functions. Mitigating this impact is one of the goals of Sunstar.

Iv SunStar Client

Figure 4 shows the overall architecture of the Sunstar client from the point of view of a single client. Sunstar runs as a middleware between the video player (and its codec) and the network layer in the operating system. The Sunstar middleware initiates a series of http connections to remote servers which store the video content. These remote servers are themselves part of larger server farms where the same video content is replicated on multiple servers. They are assigned to a client requesting a video through manifest files that the client downloads from an initial designated server, and are chosen through server selection algorithms imposed by the provider. Multiple servers within the same server farm may be selected.

The internals of the Sunstar middleware are as follows. A multi-threaded http connection pool maintains connections to the selected servers. For a given video, the Sunstar middleware determines the next set of video chunks to download through each of the available http connections in the connection pool. This allocation is determined by the Scheduler (§ IV-A). We revisit the scheduler formulation later in the section, but in a nutshell its goal is to allocate requests across servers so as to realize the best possible trade-off between cost and performance (video quality). Based on the results of §III-B this boils down to reducing the download rate as much as possible to control cost, while ensuring that client’s performance does not suffer444Note, that as long as video quality remains high, clients have an incentive to cooperate with any video-download strategy..

Note that barring an unacceptably long pre-buffering phase (to build-up a large playback buffer), blindly downloading at a constant low rate (as assumed in the calculations of §III-B) will result in poor performance, as it makes the playback buffer vulnerable to lulls in network bandwidth. Such lulls are unavoidable in the Internet, and the challenge is then in compensating for them with rate increases that are as low as possible while still avoiding instances of buffer underflow. Sunstar’s scheduler relies on a mathematical model for quantifying this trade-off, and uses it to determine how to adapt its download rate from each server. More formally, the scheduler aims to achieve an (average) target rate of while ensuring minimum variations around it. We peg

to be at least the playback rate of the client to make sure that the playback buffer is never empty, but choose it to be as small as possible to minimize the impact on cost. The scheduler runs periodically to determine the number of requests to be sent to each server in a given epoch to achieve the target rate

and low variance.

After chunks are downloaded, they are assembled into frames in the playback buffer. As its name suggests, the Bandwidth estimator (§ IV-B) uses the rate at which requested chunks arrive on each connection to estimate the available bandwidth () to each server. The average bandwidth to server and its bandwidth variance are reported to the scheduler as inputs to its scheduling decisions.

After the scheduler determines the number of chunks that need to be requested from each server, the information is sent to the Request manager (§ IV-C). The request is in the form , where is the number of chunks to download from server . Note that chunks are of fixed size, and it is the number of chunks requested from server (i.e., ) that varies to adapt to current/predicted network conditions. We discuss alternatives to this approach and their shortcomings in §IV-E.

In the rest of this section, we provide additional details on the main components of the Sunstar client, namely, the Scheduler, Bandwidth Estimator, and Resource Manager.

Fig. 4: Sunstar architecture.

Iv-a The Sunstar Scheduler

The scheduler is the core component of the Sunstar client555§V and §VI show that it not only delivers significantly better performance than other schedulers, but also that it realizes those improvements with little to no impact on cost.. Its main goal is to minimize playback stalls (and therefore skips) as their frequency and duration are known to have a significant influence on QoE [QoE1, QoE2] (see also our Mechanical Turk experiments in Appendix -A). Playback stalls are a direct consequence of an empty playback buffer, which arises when the download rate falls below the playback rate for an extended period of time. To minimize the odds of such occurrences, the Sunstar scheduler periodically runs an optimization that computes how many chunks to request from each server to ensure a target average download rate while minimizing variations around that average rate.

The optimization is greedy and myopic, i.e., based on current performance estimates for the paths to each server and does not attempt to predict future performance. This is motivated by the fact that it is difficult to predict bandwidth fluctuations ahead of time, and a myopic algorithm can quickly correct course when conditions change.

The Sunstar scheduler operates in “epochs” or “rounds”, whereby it seeks to minimize the increase

in variance in each round. This lets the optimization decide the number of requests for each server based on short-term (per round) bandwidth estimates, and adjust them in each round to respond to changes. An alternative would rely on longer term bandwidth estimates, but this makes the algorithm less responsive to occasional outliers (

e.g., network failures/congestion events that happen every so often). The round duration is set empirically (see §V-G

for details) as a compromise between responsiveness, variability of the estimates, and computation overhead (the optimization runs in each epoch). The optimization takes as input the current bandwidth estimates to each server. It then computes a target rate (number of requests) for each server in the next round.

The rest of this subsection describes the formulation of this optimization in more details.

Iv-A1 Scheduler Optimization

Let be the set of servers assigned to a MuMS client. The goal is to guarantee each client an average rate  (note that in the case of an ABR codec, this value would change whenever the codec opted to adjust the rate based on its own decision function), while minimizing changes in the running variance expressed as


where is the inverse of the time it takes for requests to arrive from server (in other words is the inverse of the application level round trip time), and is the number of chunks that the client should request from server . The client’s attained rate is thus .

Minimizing changes in the running variance then translates into solving the following optimization:


where is the expectation of , and the current window size666 upper bounds the number of requests that can be sent to each server to enforce TCP-like congestion control – see §IV-B. to server (both computed by the bandwidth estimator – see §IV-B). The are the decision variables and determine the number of requests allocated to each server. Let and be upper and lower bounds for , respectively. and can be estimated using Chebychev’s inequality by identifying the upper/lower bounds on the and percentiles of the distribution of . Eq. (2) is then equivalent to:


Eq. (3) is derived by first converting the absolute value form of the problem to its linear form and then replacing the two resulting bounded constraints with tighter bounds through and .

The above formulation assumes integer values for , but this can occasionally result in significant overshoots in the realized rate. We, therefore, use fractional values for . Because requests are for an integer number of chunks, we maintain a state variable that accounts for the “excess” rate to each server. After solving the optimization, we compute and use this value as the target rate to server . When this corresponds to a fractional number of chunks, we then round it up and update accordingly.

There are two other aspects to the above optimization that need discussion, as they affect the implementation of the Sunstar scheduler.

(a) Distribution of the fraction of time clients stalled in each run.
(b) Distribution of the number of chunks skipped in each run.
Fig. 5: Comparison to single server. High and Medium , smooth bandwidth variations.

The first is Dealing with Infeasibility. As network bandwidth fluctuates, the optimization may not always be feasible. However, we still want the client to request the best possible transmission rate777Alternatively, this may serve as an ABR trigger to switch to a lower rate. (the closest rate to ). For that purpose, we progressively decrease the target rate in the Sunstar optimization (by  in our experiments) until a feasible solution is found. Once a feasible solution is found, Sunstar attempts to bounce back to its original target as quickly as possible by increasing its target rate by in each subsequent round (while the solution remains feasible) until it can again maintain its original target rate.

The second issue concerns Breaking ties. The optimization needs not have a unique solution, e.g., with homogeneous paths with sufficient bandwidth, using any of them is a feasible and optimal solution. We, therefore, add two tie-breaking criteria to the optimization: (1) we minimize the number of servers used, and (2) we favor those that have been used more frequently in the past. Both criteria aim to reduce the number of out-of-order chunks.

We show in Section V-G that the optimization of Eq. (3) is practical in that it can be solved for up to servers in around ms without overloading an entry-level machine. We also note that it is possible to improve the optimization run time by re-using past solutions as a starting point in each round [bertsimas]. Such improvements are, however, beyond the scope of this paper.

Iv-B Bandwidth Estimation

It is important for the Sunstar client to utilize the available bandwidth effectively and not exacerbate congestion if/when it happens. Thus, in our design of the Sunstar client, we maintain an estimate of the bandwidth available on each path using a TCP-like mechanism, which computes a window size (in chunks) for each path . This window provides an upper bound for the number of requests that can be sent to a server. Specifically, is computed using a TCP CUBIC congestion control mechanism. We chose CUBIC as our bandwidth estimation method as the congestion window changes are solely dependent on the time of the last congestion event and not the latency to the server (this is important as we use multiple servers in the client and there might be large discrepancies in the RTT to each of these servers). Note that CUBIC relies on losses to reduce its window size, and since chunks are requested over http, there is no application level loss in the Sunstar client. We use a drop of more than in the estimated average rate to server  as equivalent to a CUBIC packet loss. is computed using an exponential average of request completion times (this acts as a low pass filter and eliminates estimation noise). The  threshold was chosen based on the best empirical performance observed across a sweep of the possible values.

Iv-C The Request Manager

As described earlier, the Sunstar client is an application layer protocol. Once the Sunstar scheduler computes an allocation of requests it informs the request manager of the new allocation. The request manager maintains a list of chunks for which a request has not been sent to any of the servers. It uses a multi-threaded HTTP connection pool with open connections to each of the servers to request the appropriate number of chunks from this list from each server. Once responses arrive, the request manager ensures that it is placed in the appropriate place in the playback buffer.

Iv-D Performance Optimizations

The Sunstar client incorporates optimizations previously devised for MPTCP. In particular, it employs opportunistic retransmit. When latencies across servers are significantly different, opportunistic retransmit prevents the client from stalling while waiting for a chunk requested from a high latency server. A mechanism similar to TCP time-outs is also implemented for chunk requests. If a response is not received before a time-out, the request is re-sent to another server. Last, we limit the number of retries for chunk requests to bound playback stalls. Once this retry limit is exhausted, the chunk is skipped and the player relies on its codec to mask missing frames. This decision is implemented as part of Sunstar’s request manager.

Note that while, as mentioned earlier, Sunstar is both complementary to ABR and should be able to integrate with an ABR codec, the current implementation is limited to fixed rate codecs. Extending the design to make it fully compatible with ABR codecs is part of future work.

Iv-E Discussion of Alternative Design Choices And Trade-offs

We acknowledge that there are a number of alternatives to our design of Sunstar. These include:

Fixed vs adaptive chunk sizes. Our design uses fixed sized chunks and adjusts rates by varying the number of chunks requested. An alternative would be to keep the number of chunks fixed and vary the chunk size (this is the approach of [youtuber]). We opted for the former approach as empirical evaluations with both approaches showed that it exhibited better performance given our goal of rate stability (See §V).

Application vs transport layer solution. An alternative to an application layer solution is a transport layer one, i.e., by extending a protocol such as MPTCP. A transport layer solution has advantages such as finer adaptation granularity and, therefore, faster reactions. However, extending MPTCP to work in a multi-server rather than single-server setting involves non-trivial changes to the network protocol stack. In addition, the tight coupling of MPTCP components can result in unexpected interactions [arzani14]. Avoiding or predicting them calls for a careful evaluation of proposed changes. Hence, an application layer solution offers a number of benefits, in terms of flexibility and ease of deployment.

V Performance Evaluation

Fig. 6: Impact of latency - Two servers, Medium , bursty bandwidth variations ( difference in latency).

We implemented a prototype of the Sunstar client and in this section report on its evaluation with respect to performance, namely, (1) improvements in video quality over a single server client; (2) improvements in video quality compared to two representative multi-server schedulers: an RTT-based scheduler888We refer to it in the paper as the Min-RTT scheduler, and selected it as representative of schedulers that focus on performance, i.e., by prioritizing downloads from the “closest” servers. similar to that used (at the transport layer) by MPTCP, and the YouTuber scheduler999As previous results exist for YouTuber, its comparison to Sunstar includes a few more scenarios than for the Min-RTT scheduler. of [youtuber] that works by estimating the bandwidth available to each server, and requesting chunks at a rate equal to that estimate; (3) its run-time performance, in particular, that of its core optimization routine. The other key aspect of the Sunstar design, i.e., cost, is evaluated in §VI.

(a) Fraction of time stalled.
(b) Number of skipped bytes.
Fig. 7: Live streaming schedulers comparison - Medium , bursty bandwidth variations.

V-a Evaluation Setup

Our evaluation setup is similar to that of §III, but spans a broader set of scenarios to offer a more a comprehensive evaluation of Sunstar’s performance.

Our Sunstar prototype is a Linux-based MuMS client that retrieves video from Nginx web servers. The videos are  MB and broken into fixed chunks of  KB (results with slightly different chunk sizes and with Apache servers were similar). The codecs are fixed rate codecs with a target rate  Mbps (experiments with  Kbps yielded similar results). The scheduler’s epoch is  ms.

Fig. 8: Min-RTT vs. Sunstar schedulers - High , smooth bandwidth variations.
Fig. 9: Min-RTT vs. Sunstar schedulers - Medium , smooth bandwidth variations.

As before, experiments are carried out on the Emulab testbed. Clients connect to each server via a dedicated “path” consisting of a link connecting the client to a machine running dummynet, followed by a link connecting that machine to the server. The dummynet machine on path  is used to add a fixed latency of ms and vary the capacity available to the client on path  from  to  with an average value of . Those variations seek to capture the impact of interfering traffic on the bandwidth available between clients and servers. We consider two types of bandwidth variations smooth and bursty. Under smooth variations, the available bandwidth increases and decreases progressively in fixed and small sized steps (we use a number of different step sizes for each experiment). This seeks to mimic the progressive bandwidth fluctuations that arise from client arrivals and departures. In contrast, bursty bandwidth variations are based on large sporadic changes in available bandwidth that represent abrupt changes in congestion, e.g., because of the start of a high-bandwidth download on a shared link or the start of a live streaming event.

We experiment with three configurations: (i) High , where . This captures the bandwidth conditions observed in non-peak viewing times as well as that observed by clients connecting to over-provisioned CDNs. (ii) Medium where  Mbps  Mbps. This scenario is more typical of today’s ecosystem where CDN bandwidth provisioning and server capacity are “just enough” for the expected workload. In such a setting, the bandwidth available to the client occasionally falls short of its download rate, thus, negatively impacting its playback experience. (iii) Low where . This is a scenario where the CDN and/or servers are oversubscribed, e.g., when a live streaming event is more popular than predicted. In the Low scenario, multiple paths (servers), are necessary just to meet the target rate.

Performance metrics. As in Section III, QoE is measured based on [QoE1, QoE2, QoE3]: (1) the number of skipped chunks; and (2) the fraction of times clients are stalled.

V-B Comparison to Single Server Clients

Fig. 10: Min-RTT vs. Sunstar schedulers - Low , smooth bandwidth variations.
Fig. 11: Min-RTT vs. Sunstar schedulers - Medium , bursty bandwidth variations.

We first compare the performance of the Sunstar client to that of a single server client (the typical streaming configuration today). This repeats the earlier MuMS validation of §III, but now using the Sunstar scheduler instead of the Min-RTT scheduler. Only the High and Medium configurations are considered, since they are the only two for which an individual path has enough (average) bandwidth. Experiments with smooth and bursty variations yielded qualitatively similar outcomes, and we therefore report only the former. Statistics for stall durations and number of skipped chunks for High and Medium are combined and shown in Figures 4(a) and 4(b), respectively101010Error bars show the

percentile confidence intervals assuming Bernoulli distributed samples.

. The figures confirm the results of §III, but now for the Sunstar client. A casual comparison of Figure 3 and Figure 4(b), also hints at the Sunstar scheduler out-performing the Min-RTT scheduler. We explore this aspect next.

(a) Fraction of time stalled.
(b) Number of skipped bytes.
Fig. 12: YouTuber vs. Sunstar schedulers - Medium , bursty bandwidth variations.

V-C Comparison to the Min-RTT scheduler

We focus on a configuration where a single client downloads videos from a given set of servers (from 2 to 4). The client uses either the Sunstar scheduler or the Min-RTT scheduler. Due to the high variance in the performance of the Min-RTT scheduler, we report stall statistics111111Results for skipped chunks statistics were of a similar nature. in the form of scatter plots (the large confidence intervals for the Min-RTT scheduler make the results hard to interpret). The -coordinate of each data point corresponds to the Min-RTT scheduler and the -coordinate to the Sunstar scheduler. Points below the line, therefore, indicate better performance for Sunstar. Results are shown in Figures 8 to 10 for the High, Medium and Low configurations under smooth link variations. Figure 11 presents results for one representative configuration with bursty bandwidth variations, namely, the Medium configuration.

The figures show that Sunstar consistently outperforms the Min-RTT scheduler irrespective of the number of servers used. The biggest improvements arise in the Low scenarios, where the limited resources amplify the need for judicious scheduling. The Medium scenario still sees Sunstar outperforming the Min-RTT scheduler, while the differences are less pronounced in the High scenario because the plentiful resources ensure that both schedulers perform well.

V-D Comparison to The YouTuber Scheduler

The next set of experiments compare stalls and skips statistics of the Sunstar and YouTuber schedulers, again for scenarios where clients download from a given set of servers. As the MinRTT scheduler, the YouTuber scheduler focuses on maximizing client performance. It achieves this objective by matching its download rate to the available bandwidth. Since the original design of [youtuber] only considers two servers, we limit our comparison to this scenario. In addition, because the YouTuber scheduler uses variable sized chunks, we compare the number of skipped bytes instead of the number of skipped chunks. YouTuber also lacks an explicit time-out mechanism. To avoid penalizing it we, therefore, add a time-out period of to its operation. When a stalls exceeds the time-out, KB of data (the minimum chunk size) is skipped. Finally, In reporting the results, we focus on Medium configurations that are more representative, and bursty bandwidth variations as rapid changes are expected to stress both schedulers’ adaptability121212In configurations with smooth bandwidth variations Sunstar and YouTuber displayed comparable performance..

Figures 11(a) and 11(b) report stalls and skips statistics, respectively. The results are in the form of boxplots showing the results for each metric separately. The box boundaries correspond to the and percentiles, with the median as the line inside the box. The whiskers correspond to outliers.

The results indicate that under bursty bandwidth variations, the Sunstar scheduler outperforms the YouTuber scheduler in terms of stall statistics, but is slightly worse when it comes to skips statistics. This latter difference is mostly because the YouTuber’s original design did not provide for time-outs, and the modification we applied relies on a relatively large (10s) time-out. This choice was motivated by the fact that YouTuber relies on variable size chunks and can at time request very large chunks. A low time-out value would then have often resulted in unnecessary skips for those very large chunks. A 10s time-out avoided this problem and limited the number of skips that YouTuber incurred, but at the cost of a higher frequency of stalls as seen in Figure 11(a). A complete solution to the problem likely calls for an adaptive time-out mechanism based on chunk size. However, designing and implementing such a solution is beyond the effort we could invest towards extending the YouTuber design. The results of Figure 12 offer a representative sample of how the YouTuber and Sunstar schedulers compare in terms of performance, with both typically offering better performance than, say, the MinRTT scheduler. However, as we shall see in §VI, Sunstar’s benefit also extend to affording those performance improvements without impacting the (peering) costs of the video provider.

V-E Impact of Latency Heterogeneity

The previous experiments assumed identical propagation delays to all servers. In this section, we briefly test how heterogeneity in propagation delays affects the schedulers’ efficacy. For simplicity, we limit ourselves to a scenario with two servers, Medium , and bursty bandwidth variations. The difference in propagation delays on the paths to the two servers varies from ms to ms. The results are reported in Figure 6. As the figure shows, the Min-RTT scheduler exhibits the worst performance, which degrades as increases. YouTuber is again successful at avoiding skips, but this is because of the relatively large time-out we configured and at the cost of frequent stalls. Sunstar performs well with low stalls and skips statistics and a relative insensitivity to .

V-F Live Streaming

Although video downloads represent the bulk of video traffic, live streaming remains an important service. It is, therefore, also of interest to explore the benefits of a MuMS solution for live streaming. Note that unlike video downloads, live content is generated at a constant rate, so that pre-buffering options are limited. We, therefore, emulate a live streaming experience by first writing a number of bytes equal to the client’s pre-buffering threshold into a file. Subsequently, the server continues to write at a constant rate of to the file until the entire file is created. Clients start sending requests after the pre-buffer portion of the file has been created. Our experiments compare the Min-RTT, Sunstar, and a modified131313The YouTuber client was not originally designed for live streaming, and its assumption of an unlimited playback buffer would often result in infeasible requests. Initial live streaming experiments with the original YouTuber design confirmed its poor performance. We, therefore, modified it by introducing a limited playback buffer that avoided many of those problems. In all fairness, there might be better modifications to allow it to accommodate live streaming. YouTuber (YouTuber-limited) schedulers, for live streaming clients. We focus on a Medium configuration with bursty bandwidth variations.

The results are shown in Figures 6(a) and 6(b), which illustrate that Sunstar outperforms the other schedulers in both stalls and skips. YouTuber-limited has the worst performance, in part because, in spite of our modifications, its aggressive download strategy results in unnecessary stalls. Its design aims at maximizing its download rate during “on” periods (when the playback buffer content drops below a pre-specified threshold). In live streaming, such an aggressive download strategy can result in requesting content before it is available. This results in a stall while waiting for a retransmission (of the request). Limiting the YouTuber buffer size, as we did, allows the client to pace itself (by allowing the playback buffer to fill up and the player to go into “off” mode), but is still not entirely successful at eliminating unnecessary stalls.

We repeated the above experiments with larger pre-buffers. A larger pre-buffer creates a larger “margin” to absorb subsequent rate variations, at the cost of a delayed start in the live stream. This should benefit all schedulers, but particularly YouTuber, as it can help mitigate occurrences of stalls. As expected, performance improved for all schedulers, but Sunstar continued to outperform the other two.

V-G Scheduler Execution Time

Last, we evaluated Sunstar’s run time performance. Recall that Sunstar’s optimization runs every epoch (ms in our experiments), so that its efficiency matters.

The optimization uses the Mosek Solver [mosek], and the Emulab client machines on which it runs are Dell PowerEdge 2850s with a single 3GHz processor, 2GB of RAM, and two 10,000 RPM 146GB SCSI disks [pc3000].

Based on our experiments, the run time varies (increases) with both the number of servers and with the amount of bandwidth available to a client on each path (both contribute to larger search spaces for the optimization). We varied the number of servers from two to four and the (average) bandwidth available to individual clients on each path from  Mbps to  Mbps. The fastest average run time was for a two servers, low bandwidth configuration, for which it was  ms. The longest average run time was for four servers in a high bandwidth scenario, where it was  ms. In both configurations, the margins are the  percent confidence intervals. The run time increased significantly when using more than four servers, suggesting that without additional computational optimizations, e.g., offloading computations to the cloud, four servers likely represents a realistic limit. We also experimented with increasing the epoch duration, but while some increases are possible beyond our initial value of ms, Sunstar’s performance degrades rapidly as it increases further. This is because larger values limit Sunstar ability to adapt to bandwidth variations.

Vi Effect on Peering Costs

Sunstar primary motivation is to improve video quality, but do so without negatively impacting peering costs. The insight derived from Section III led to a design that minimizes rate variations while keeping the download rate as close as possible to the minimum feasible rate, i.e., the target download rate. The previous section showed this was effective in improving video quality. The focus here is on establishing that those benefits are realized without increasing the provider’s cost. There are two separate aspects to the impact on cost. The first is the effect of the scheduler on rate variations on individual peering links that affect the percentile. The second is the influence of server selection on the load of individual peering links. We explore both separately in this section.

We evaluate peering costs using a setup similar to that of Section V. The main differences are increases in both the number of clients simultaneously active, and the number of servers available to them (we now have servers to chose from). The latter allows us to consider the impact of server selection on cost. While this section is only concerned with cost, we also evaluated Sunstar’s performance and verified that its benefits remain qualitatively similar to those of Section V.

Fig. 13: Peering costs under different schedulers using round robin server selection.

Vi-a Scheduler Impact

Because the YouTuber scheduler behaves aggressively when it comes to download rate (it seeks to use as much of the available bandwidth as possible), we expect (and validated) that it performs poorly when it comes to cost. We, therefore, focus our efforts on comparing Sunstar to a single server solution (baseline), and to a client using the Min-RTT scheduler.

In a given round of experiments, clients connect to out of Emulab servers. Since Emulab has a limited number of physical machines available, we configure each machine to have active clients at any point in time, for a total of  clients in the system. To emulate an environment with clients coming and leaving, clients watch videos of fixed duration chosen from a set of minutes long videos, and then leave to be replaced by a new client that randomly chooses a new video. Video selection is biased towards shorter videos (based on the observations of [gill2007youtube]).

Each physical machine has a dedicated link to a dummynet node through which clients originating on that machine experience bandwidth variations that are independent of those for clients on other machines. We use a Medium configuration, as described in the previous section, but scale the bandwidth by a factor (to account for the number of clients on the link). Low and high bandwidth scenarios yielded qualitatively similar results in terms of cost. Each server is in turn logically connected to a single peering link shared by all clients accessing it. The bandwidth on the peering link itself is high enough to avoid congestion, independent of the number of clients assigned to the server. New clients first connect to a “master” server, which redirects them to a list of servers, from which to download their video. In this section, the master uses a simple round-robin server assignment strategy to select which  servers () to assign to a new client.

For comparison purposes, a given experiment uses the same link bandwidth variation patterns and server assignments for all schedulers. An experiment spans  hours, with the provider cost obtained by summing the individual percentiles peering costs (traffic volumes) of the servers in those  hours. Statistics are then computed over a set of independent experiments. Figure 13 reports results for the following configurations: Single Server, our baseline, Min-RTT with 2 and 3 servers, Sunstar with 2 and 3 servers. The definition of the boxplots used in the figure is similar to that of Section V.

Fig. 14: Peering costs comparison with “smart” server selection scheme that jointly optimizes for cost and performance.

We make two observations from the results of Figure 13. The first is a confirmation of the insight of Section III-C, namely, cost increases with the number of servers. This is seen in the figure for both the min-RTT and the Sunstar schedulers, which display cost increases as the number of servers goes from to , even if the magnitude of the increase is slightly less for Sunstar. The latter leads to our second observation, namely, Sunstar’s rate variation minimization strategy is successful in mitigating cost increases. Specifically, Sunstar  servers configuration outperforms not only a  servers solution using the Min-RTT scheduler, but also the baseline single server configuration. And Sunstar  servers configuration has a cost comparable to the single server baseline, and significantly lower than its Min-RTT counterpart. This offers empirical validation of the simple analysis of Section III-C and the guidelines it inspired. In other words, Sunstar succeeds in improving video performance without affecting provider’s cost.

Vi-B Server Selection Impact

This section focuses on exploring whether a server selection strategy that jointly optimizes for cost and performance can help further reduce peering costs. Since to the best of our knowledge no prior MuMS server selection design exists that explicitly targets minimizing provider cost, we explore whether a server selection algorithm that jointly optimizes for cost and performance can help Sunstar further reduce its cost. We formulate the server assignment problem as a constrained optimization141414Note that unlike the optimization of the Sunstar scheduler, this optimization is required only once when a new client starts. (see Appendix -B for details) that seeks to greedily minimize increases in cost when assigning servers to new clients, while meeting client rate constraints.

The results from experiments combining this optimization with the Sunstar scheduler are shown in Figure 14, which compares the cost of Sunstar for and servers under the previous round-robin assignment policy, to its cost using the results of the optimization of Appendix -B. The figure illustrates that optimizing the server assignment did not yield a meaningful reduction in cost, with the outcome falling in between the results for the  servers and  servers scenarios. This is not unexpected since the number of servers that the optimization assigns to a given client is not fixed. In particular, the optimization may assign any number of servers to a new client (up to the maximum number available) when warranted by performance. Because of space limits, we do not report the performance of the combination of Sunstar and our server assignment optimization, but it did not offer statistically significant improvements in performance. In other words, optimizing server assignment did not help the Sunstar client achieve either better performance or lower cost, when compared to a simple round-robin assignment policy151515A separate experiment using a server selection strategy that picks the closest servers (lowest RTT), yielded a similar outcome.. The latter is partly due to the Sunstar scheduler design, as its goal of keeping rates low (as per the results of Theorem 1) realizes much of the available gains in cost reductions.

This being said, we acknowledge limitations in the optimization of Appendix -B. In particular, its reliance on a greedy approach to estimate the impact of a new server assignment on the percentile cost can clearly be improved, albeit at the cost of significant added complexity. A solution that better predicts the impact of different assignments on costs may, therefore, further improve on our formulation.

Vii Conclusion

The paper presented the design and implementation of Sunstar, a MuMS client aimed at improving video quality without impacting providers’ peering costs. Sunstar relies on insight developed from simple models for evaluating the effect of both multiple paths and download rates on peering costs. Those models helped illustrate why both multiple paths and aggressive download rates could negatively affect providers’ costs. This led to a design based on an optimization framework that guarantees clients a target rate, while minimizing rate variations. Experiments on Emulab demonstrated Sunstar’s ability to successfully meet its goals.

-a Mechanical Turk Experiment

(a) Single stall.
(b) Multiple stalls, spaced out.
(c) Multiple stalls, bursty.
Fig. 15: Qualitative (Mechanical Turk) analysis of video QoE.

The motivation was to further validate the QoE metrics we selected to evaluate video quality. For that purpose, we used a high quality (HD) documentary about Buckingham Palace161616 The video was divided into equal sized segments of  min each, and different types of impairments were introduced in those segments. Due to of logistics constraints, only results for stalls are available. Specifically, we considered: 1) a single stall of variable duration at a random location in the video; 2) multiple stalls of small ( sec), medium ( sec), and long ( sec) durations, evenly distributed in the segment; 3) multiple stalls with the same distribution in duration, but now closely spaced ( sec) in a burst. In 2) and 3) we varied the number of stalls. The quality of the video segments was evaluated on a scale ( being the lowest quality) by users recruited through Amazon’s Mechanical Turk market. For calibration purposes, users were first presented with an unimpaired video segment, and told to assign it a rating of .

Results of the study are presented in Figure 15, which confirms a strong correlation between stalls, both number and duration, and video quality. The limited size of the study is clearly insufficient for broad conclusions, but it further confirms previous QoE studies [QoE1, QoE2, QoE3] and the impact of stalls on video quality. Hence, fewer/shorter stalls do translate into higher video quality.

-B Server Selection Algorithm

Server selection can be viewed as a Stackelberg game between the clients and the provider, with the provider as the leader and clients as the followers. Once assigned servers, clients seek to maximize their performance by scheduling requests to servers accordingly. Given this behavior, the provider’s goal is to assign servers so as to minimize the percentile cost. This non-convex cost function together with the online nature of the game make computing the optimal assignment strategy hard.

We therefore propose a semi-online greedy optimization that is run every  mins and uses the current estimate of the percentile cost to assign client’s to servers in a way that meets their rate guarantees while minimizing cost. Specifically, the optimization maintains an estimate of the number of client’s expected to arrive from each region (clients in a region have similar bandwidth profiles and share the same connections to servers). Given these estimates, it seeks to identify which assignment of servers for each group of client results in the smallest increase in the current percentile cost. Furthermore, while the optimization’s goal is to minimize peering costs, it acknowledges that this should not be at the expense of poor performance for the clients. Thus, it also includes two additional constraints:


where is the number of requests client sends to server and is the rate in chunks per second from that client’s region to server .

Note that, reusing the notation of Section IV-A1, Eq. (5) can be written as where is a matrix with and , is a vector where , and .

Take as the current percentile cost on peering link , the current load on peering link , and the expected number of clients arriving from region in the current decision period. We aim to solve the following optimization:

where is the maximum window size allowed on the clients, and the server selection algorithm assigns all servers with to region . It is straightforward to show that in the above equations is positive semidefinite. Therefore, the optimization is convex and can be solved efficiently.