Delay-Sensitive and Power-Efficient Quality Control of Dynamic Video Streaming using Adaptive Super-Resolution

by   Minseok Choi, et al.

In a decade, the adaptive quality control of video streaming and the super-resolution (SR) technique have been deeply explored. As edge devices improved to have exceptional processing capability than ever before, streaming users can enhance the received image quality to allow the transmitter to compress the images to save its power or pursue network efficiency. In this sense, this paper proposes a novel dynamic video streaming algorithm that adaptively compresses video chunks at the transmitter and separately enhances the quality at the receiver using SR. In order to allow transmission of video chunks with different compression levels and control of the computation burden, we present the adaptive SR network which is optimized by minimizing the weighted sum of losses extracted from different layer outputs. for dynamic video streaming. In addition, we jointly orchestrate video delivery and resource usage, and the proposed video delivery scheme balances the tradeoff well among the average video quality, the queuing delay, buffering time, transmit power, and computation power. Simulation results show that the proposed scheme pursues the quality-of-services (QoS) of the video streaming better than the adaptive quality control without the cooperation of the transmitter and the receiver and the non-adaptive SR network.


Markov Decision Policies for Dynamic Video Delivery in Wireless Caching Networks

This paper proposes a video delivery strategy for dynamic streaming serv...

Deep SR-ITM: Joint Learning of Super-resolution and Inverse Tone-Mapping for 4K UHD HDR Applications

Recent modern displays are now able to render high dynamic range (HDR), ...

Adaptive video transmission using QUBO method and Digital Annealer based on Ising machine

With the dramatically increasing video streaming in the total network tr...

FAST: A Framework to Accelerate Super-Resolution Processing on Compressed Videos

State-of-the-art super-resolution (SR) algorithms require significant co...

MobiSR: Efficient On-Device Super-Resolution through Heterogeneous Mobile Processors

In recent years, convolutional networks have demonstrated unprecedented ...

Power Control for Wireless VBR Video Streaming: From Optimization to Reinforcement Learning

In this paper, we investigate the problem of power control for streaming...

QuDASH: Quantum-inspired rate adaptation approach for DASH video streaming

Internet traffic is dramatically increasing with the development of netw...

1 Introduction

Recently, with rapidly increasing smart user devices, the content delivery network has been getting more attention for supporting the excessively large global data traffic. As reported in [1], tens of exabytes of global data traffic are being handled on a daily basis at present, and most of the global data traffic are dominated by online video services, e.g., video streaming, video-on-demand (VoD), live streaming and VR streaming. Depending on these types of applications, the quality-of-services (QoS) and the quality-of-experience (QoE) of video services, such as playback stall, latency, video quality and quality fluctuations, have been largely studied [2].

According to [3], the internet-of-things (IoT) devices will account for 50% of all global networked devices by 2023; therefore, it is expected that online video services in the IoT and/or vehicular networks will be increasingly required. For example, autonomous driving improves passengers’ trip experiences by providing video applications with the help of vehicle-to-vehicle (V2V) communications [4]. Another possible scenario is the device-to-device (D2D)-assisted wireless caching network [5]. In this case, cache-enabled devices having limited storage size and power budget can directly deliver the desired contents to streaming users; thus, balancing the tradeoff among communications, computations and streaming users’ QoE has become a very critical issue.

As the video streaming system has launched in wireless networks, video applications have to be provided to users with limited wireless resources under the time-varying channel conditions. In addition, heterogeneous user preferences over applications, time, and geological locations simultaneously require different quality versions of various contents. To deal with the above challenges, dynamic adaptive streaming over HTTP (DASH) that dynamically chooses the most appropriate bitrate has been used in wireless networks [6]. In DASH systems, a stream consists of sequential chunks and it allows video chunks to have different quality levels; therefore, bitrate adaptation schemes have been extensively studied [7].

In parallel, with the development of edge devices having caching and computational capabilities, wireless caching and mobile edge computing (MEC) technologies have been considered as efficient methods for improving the performance of adaptive bitrate (ABR) streaming [8]. In general, video contents are encoded into multiple levels representing different bitrates and qualities; however, wireless caching helpers have a limited storage size so that caching identical content with various quality versions is inefficient. Therefore, MEC technologies are employed to transcode the cached contents depending on network status and/or buffer states of streaming users. Especially, as scalable video coding (SVC) that conveniently accumulates enhancement layers above a base layer for higher quality versions has gained popularity, [9] joint optimization of computational tasks and video delivery has become more powerful. On the other hand, at the user (receiver) side, edge devices can improve the quality of streams with the help of increased computing power by themselves. Super-resolution (SR) is the epitome of techniques for enhancing the video quality by using deep neural networks.

This paper considers the situation in which the transmitter, having the capability of transcoding or compressing the desired video contents, is willing to deliver video chunks to the receiver having capability of enhancing the video quality by using SR. Even though the computing power and computational capability of edge devices have been developed, operation of SR is still a time- and power-consuming task. Therefore, we focus on adaptive control of video delivery with differentiated quality requirements, transcoding at the transmitter side, and quality enhancement using SR at the receiver side. Video streaming over the IoT network having cache-enabled MEC entities or the vehicle-to-everything (V2X) network is a potential scenario where the transmitter also has limited power budget. Thus, our goal of the adaptive video streaming system is to pursue (i) maximizing the video quality, (ii) reducing the playback stall events, (iii) limiting power consumption of both transmitter and receiver, and (iv) stability of the video queue.

The main contributions are as follows.

  • This paper employs the SR technique for dynamic video streaming, which is applicable when the transmitter and the receiver (i.e., the user) have the capability of compressing and improving the quality of video chunks, respectively. In order to allow receiving video chunks and/or images with different quality levels and controlling the computational burden of SR, we employ the adaptive SR network, which is optimized by minimizing the weighted sum of losses extracted from different layer outputs.

  • This paper proposes the adaptive quality control of video chunks depending on the time-varying network condition and both the transmitter and the receiver states. We allow the transmitter to determine the video quality enhancement rate at the receiver as well as the compression rate at the transmitter, and it is beneficial to control a variety of performance metrics while improving the average quality measure.

  • Joint orchestration of video delivery and resource usage of the transmitter and the receiver is proposed based on the Lyapunov optimization framework by employing the adaptive SR and allowing cooperation between the transmitter and the receiver for adaptive quality control. The proposed dynamic video delivery scheme balances the tradeoff among the video quality measure, the queuing delay, the buffering time, the transmit power consumption, and the CPU usage.

  • Simulation results show that the adaptive SR network enhances the quality of input images with different compression rates and adaptively controls its computational burden, inference delay, and output quality. Also, we show that the proposed dynamic video delivery balances the variety of performance metrics much better than the adaptive quality control scheme without cooperation of the transmitter and the receiver, and the dynamic video streaming using the non-adaptive SR network.

The rest of the paper is organized as follows. The existing literature related to our work is summarized in Section 2, and the dynamic video streaming system is described in Section 3. The adaptive control problem of video delivery and computational tasks are formulated in Section 4, and the proposed adaptive control algorithm and its comparison technique are described in Section 5. Our simulation results are presented in Section 6, and Section 7 concludes this paper.

2 Related Work

This section presents the related work of adaptive quality selection for dynamic streaming, cache/MEC-assisted ABR streaming, deep learning-based SR, and learning-based ABR streaming.

2.1 Adaptive Quality Selection for Dynamic Streaming

The ABR streaming has been considered a promising scheme for providing online video services via wireless links to users because it dynamically chooses video bitrates or quality so that streaming can efficiently adapt to time-varying environments with limited wireless resources [7]. Here, the fundamental issue is how to dynamically adapt the bitrate and how to determine which quality is appropriate for the current state. The existing quality adaptation schemes are generally based on the network condition [10], users’ buffer states [11], or both of them [12, 13, 14].

Due to the limited wireless resources and time-varying wireless channel conditions, joint optimization of quality adaptation and resource allocation has been largely studied in [15, 17, 16]. In [15], scheduling and resource allocation algorithm that maps SVC layers to DASH layers and reduces video playback interruptions is presented. Scheduling and quality selection are adaptively determined depending on both channel condition and users’ buffer states in [16], and similarly, adaptive video quality selection and resource allocation method is proposed in [17].

2.2 Cache/MEC-Assisted ABR Streaming

The ABR streaming was originated from Internet video streaming with the remote Internet server having the whole video library. Therefore, in the wireless streaming system, the remote Internet server fetches the desired contents to radio access networks (RAN) via wired core first network; then, the wireless RAN transmits contents to users [18]. However, since fetching videos from the remote server to RAN can result in long latency and congestion [19, 20], cache/MEC-assisted ABR streaming has been considered a promising technique for mitigating latency and congestion issues [8].

The existing studies on cache/MEC-assisted ABR streaming, jointly determine the appropriate quality selection and (i) content placements [21, 22, 23] or (ii) transcoding rate [24, 25, 26]. The cache management scheme that maximizes both the users’ QoE and energy cost saving, and the probabilistic caching method for consecutive video requests are presented in [21] and [22], respectively. EdgeDASH, the network-side control scheme to facilitate the caching capability, is proposed for appropriate quality assignments and reducing stall events in [23]. Meanwhile, the authors of [24] uses MEC to provide low-latency and ABR streaming by dynamically adapting bitrates. With the transcoding ability at cache-enabled nodes, link scheduling, power allocation, and delivery of individual video chunks are adaptively optimized in [25, 26]. Moreover, there are recent studies on dynamic streaming taking both caching and MEC into account [27, 28, 8]. Joint caching and transcoding policy that maximizes video capacity of the network is proposed in [27]. The joint caching and processing framework that determines the caching method for contents with different qualities and scheduling user requests is proposed in [8], and further, energy efficiency is considered as an additional performance metric in [28].

2.3 Deep Learning-Based Super-Resolution

The SR technique improves or recovers the quality of images or video frames, and modern SR techniques are majorly developed using deep learning methods. Among the deep learning-based SR methods, the SR convolutional neural network (SRCNN)

, [30] which uses multiple convolutional layers is one of the well-known SR techniques. The deep learning-based SR method proposed in [31]

improves the output images by allocating image input signals into the output layer. The aforementioned SR algorithms evaluates results on performance numerical values only, e.g., peak-signal-to-noise-ratio (PSNR) and structural similarity index measure (SSIM). On the other hand, generative adversarial network (GAN)-based SR methods have recently been proposed to pursue soft texture and smoothness of output images  

[32, 33].

While most of the existing studies focus on SR technique itself, not many have yet applied the SR to the practical video streaming and/or content delivery networks and jointly optimized the network decisions (e.g., bitrate adaptation, scheduling, and data transmission) together. Recently, on-device SR computation methods, which enable the enhancement of the video quality independent at the receiver have been proposed in [35, 36, 37]. Specifically, Dejavu in [35] enhances the videoconferencing in real-time by employing the historical sessions. The deep neural network for the SR is applied to the adaptive streaming system in [36], and the authors of [37] show the quality enhancement of the 360-Degree video streaming by using the SR technique. However, the above studies have not considered the adaptive SR which controls the quality enhancement level depending on the buffer and power state of the receiver, and not optimized the delivery decisions (e.g., transmission power, computing power) together.

As the number of residual blocks (i.e., hidden layers) of the neural network increases, the deep learning-based SR achieves better performance at the expense of the speed of SR processing. In other words, the tradeoff between the performance and computation time (delay) is observed. The authors of [34] offloads the computational SR tasks to the cloud to avoid the excessive latency; however, bitrate adaptation of video streaming and optimization of content delivery and network states are not considered. The anytime neural network (ANN) [38] fundamentally controls this tradeoff by allowing a quick and coarse prediction results and refining it if the computational budget is available. We applied the concept of the ANN to the GAN-based SR method to control the tradeoff between SR performance and computational tasks.

2.4 Learning-Based ABR Streaming

Recently, the learning-based adaptive quality selection method for ABR streaming has been extensively researched in [39, 40, 41], but it does not include the characteristics of wireless networks. After that, deep neural networks (DNNs) are used for jointly optimizing quality adaptation and resource allocation for dynamic streaming in wireless networks in [42, 43, 44]. In [42], QFlow

, which is a reinforcement learning approach of selecting bitrates for wireless streaming by adaptively controlling flow assignments to queues is introduced. Power-efficient wireless ABR streaming is proposed in

[43] in which power control is jointly optimized with minimization of video transmission time by using deep reinforcement learning (DRL). Furthermore, the DNN-assisted dynamic streaming using multi-path transmissions is presented in [44].

In MEC-assisted streaming systems, the authors of [45] proposed the DRL method for quality adaptation and transcoding that balances the tradeoff between the QoE of video services and computational costs of transcoding. Similarly, a joint framework of quality adaptation and transcoding is presented in [46]

using soft actor-critic DRL, which further reduces bitrate variance.

Nevertheless, quality enhancement at the edge device for smooth and high-quality streaming has not been widely studied yet, except for [36, 47, 49]. Quality enhancement of video contents using SR at the receiver side is first realized for adaptive video delivery in [36], and further, the efficient content-aware video delivery is proposed by leveraging redundancy across videos in [47]; however, characteristics of wireless networks are not captured. In [49], the DRL method of integrating the SR technique for quality improvement of videos with the wireless video streaming system is proposed. This scheme jointly pursues high quality, low quality variations, and infrequent rebuffer events; however, adaptive controls of computational tasks of transcoder and the DNN for SR are not considered.

Compression rate
Number of transmitting images
Transmit power
Depths of super-resolution network
GPU core usage
Transmitter queue length
Receiver buffer length
Virtual queue for limiting power consumption
Virtual queue for limiting GPU consumption
Arrival rate of transmitter queue
Size of file compressed with
Channel gain
Processing time of ASRGAN
Task size of image recovery
Number of possible quality levels
discrete time duration
Threshold for average power consumption
Threshold for average GPU core consumption
Transmit power budget
TABLE I: System Description Parameters

3 Problem Statement

This paper focuses on adaptive quality control in the wireless video streaming system as shown in Fig. 1. Let the transmitter have the capability of compressing the high-quality images requested by a user and the user have the ability to enhance the quality of the received images. Each video file or stream consist of a series of images, called as chunks that is in charge of the fixed playtime. When the user starts to play the stream, the server delivers the desired video chunks in sequence to the user.

3.1 Transmitter Queue Model and Video Transcoding

Suppose that the transmitter has all of the desired video contents with the highest quality. These images are accumulated in the first-in-first-out (FIFO) transmitter queue as shown in Fig. 1. The transmitter is deployed with the video transcoder; therefore, the desired images could be compressed before they are delivered to the user. Denote as the set of video bitrates, where is the number of possible video bitrates. At the transmitter side, there are three decision parameters at every slot as follows: 1) the number of chunks supposed to deliver denoted by , 2) the transcoding rate of the images denoted by , and 3) the transmit power . Although the average streaming quality is very high, if the video quality is frequently fluctuating, it could degrade the user’s QoS. Therefore, assume that the chunks supposed to deliver at the same slot are compressed with the identical rate. In other words, chunks have the identical transcoding rate . The quality and the size of each chunk are determined by the transcoding rate . Denote and as the quality and the size of the chunk compressed with the rate , respectively.

Fig. 1: Adaptive Quality Control of Video Streaming System

The queue dynamics in each time slot can be represented by and , where and stand for the queue backlog and the arrival process of the transmitter queue at slot , respectively. The queue backlog counts the number of chunks in the queue. and

semantically mean the numbers of the requested and transmitted images, respectively. Simply, we assume the uniform distribution of

, i.e., , where is the maximum number of the chunks that can be transmitted at each slot. On the other hand, obviously depends on the capacity of the communication link between the server and the user and the transcoding rates of images as follows:


where is the channel capacity in bits at slot . Also, is the time duration of each slot, is the bandwidth, is the channel gain between the server and the user, and is the noise variance.

The Rayleigh fading channel is assumed for the communication link from the server to the user. Denote the channel with , where controls slow fading with being the server-user distance and

represents the fast fading component having a complex Gaussian distribution,

. Here, is the pathloss exponent. Since the transmitter has the finite power budget , i.e., , we can assume the upper bound on the expected value of , i.e., .

3.2 Adaptive Super-Resolution Generative Adversarial Network (ASRGAN)

In this section, we introduce the Adaptive Super Resolution Generative Adversarial Network (ASRGAN). The ASRGAN is based on SRGAN [32] and is inspired from depth-controllable very deep super-resolution network (DCVDSR) [48, 38]. Denote the lossless high resolution image as and the compression image with the compressed rate as . The ASRGAN consists of two components; the generator and the discriminator . residual blocks in enhance the quality of . In addition, the ASRGAN can extract the feature of the input image from every -th redisual block for all . The extracted feature of the -th residual block is defined as follows:


where stands for a set of parameters of all blocks from the initial one to the -th residual ones , respectively.

3.2.1 Loss function

In this subsection, we introduce the loss function of the ASRGAN. According to 

[32], the mean squared error (MSE) , the Euclidian distance loss , and the adversarial loss are used for introducing a complicated loss function. Here, compares the feature of the pretrained VGG19 to , and represents the distinguishness whether the unknown image is the output of or the . The aforementioned loss function represents as follows:


where represents VGG19, and stands for the L2 loss. Since the ASRGAN can extract SR images from , the loss function should be designed in the consideration of training for all . Therefore, we propose that the loss functions (e.g., , and ) are reorganized into the form of the weighted sum. The newly introduced loss function is presented as follows:

where (10)

where is the weight factor for the loss of . According to [38], it is possible to train the neural network successfully, if the weight for the shallow residual block is small (i.e., is small) and the weight increases as the residual block goes deeper (i.e., is large), which is described in (10).

Finally, the generator parameter and the discriminator parameter can be trained by minimizing (6)–(8) as follows:


where , and stand for the scaling factor for , and , respectively.

3.2.2 Data preprocessing

In this subsection, we introduce the method of data preprocessing for training ASRGAN. The transmitter has high resolution image of size . The transmitter compress into of size

with bicubic interpolation. Subsequently, the transmitter send

to the receiver. Meanwhile, the receiver transforms the compressed image into image of size . Regardless of the resolution, has size for all and ASRGAN is able to train without concerning additional model against the image size and resolution.

3.3 Receiver Buffer Model and Video Quality Enhancement

The user device is deployed with a DNN that can enhance the quality of the compressed video chunks by using the SR technique. Here, we adopt the ASRGAN explained in Sec 3.2.1 for SR operation. The ASRGAN can dynamically select the depth of the neural network to be used for SR with only minimal additional parameters. Obviously, the larger depth leads to a better quality of the resulting images; however, the longer processing time is required, and vice versa. Based on the ASRGAN, the two decisions should be made at the user sides: 1) the number of depths to be used for SR denoted by , and 2) the number of CPU cores to be used for operating the ASRGAN denoted by .

Since is carefully determined depending on the channel capacity, we assume that images can be successfully delivered to the user within the time duration of . The user device decides the appropriate and to operate ASRGAN for received images. Suppose that the identical depths and CPU cores are employed for enhancing the quality of images.

In order to measure the computing time required for SR, we consider the virtual receiver buffer in which the processing time for improving the quality of images by using the ASRGAN with the depth and CPU cores. The buffer dynamics in each time slot can be represented by and , where is the arrival process of the receiver buffer . The departure rate is a constant obviously, because goes by between two consecutive slots. The arrival process can be described by


where is the task size (e.g., required CPU clocks) for enhancing the quality of the received chunks transcoded with the rate by operating the ASRGAN with , and is the processing time of the ASRGAN using CPU cores. Also, is the indicator function, and is the average CPU clocks for processing a bit. Since and is finite if , and we can assume .

After finishing the SR operation for the received images, the final output quality is determined. Let be the quality measure of the chunks that is transcoded with the rate at the transmitter and whose quality is enhanced by using the ASRGAN with the depth at the receiver.

3.4 Tradeoff among queuing delay, buffering time, video quality, power consumption, and GPU usage

The video delivery latency in our wireless streaming system consists of the queuing delay, the chunk processing delay (i.e., delay time required for SR operation), and the transmission time. Since we assume that is appropriately determined to be successfully delivered within one slot under the channel capacity , the transmission time could be ignored compared to the queuing delay and the image processing delay. In this model, the queuing delay is caused when video chunks are waiting for being departed from the transmitter queue. If the channel condition is weak, the server cannot transmit large number of the high-quality chunks, and it could generate the queuing delays. In this case, it is better to compress chunks and to transmit many of them. In addition to this, the server should consider the power consumption. If the battery power is not sufficient, then the number of transmitting images is limited and the queuing delay increases.

Additionally, the chunk processing delay results from the operation of the SR technique, and the average chunk processing delay is proportional to the average length of the receiver buffer, similar to the transmitter queuing delay. If the received chunks are compressed with the high rate , the user should choose large to pursue the high quality. Depending on and , the task size is determined, and the task becomes heavy which causes the processing delay as the user enhances the quality of the image more. In this case, a choice of large could limit the excessive processing delays; however, in order to support the multi-programming system at the edge device, usage of CPU cores has to be also limited. Accordingly, the chunk processing delay depends on , , and . Also, the streaming delay or playback stall that the user experiences is closely related to the average chunk processing delay. In general, the buffering time is given to the user for receiving initial parts of the stream before the video playback, and the user experiences the streaming delay if the average buffer length of is longer than the given buffering time. The buffering time analysis will be explained in detail in Sec. 5.2.

According to [50], the average queuing delay is proportional to the average queue length; therefore, we can reduce both the transmitter queuing delay and the chunk processing delay (i.e., buffering time) by limiting the transmitter queue backlog and the receiver buffer length. To this end, the Lyapunov optimization theory [51] proved that the time-average queue backlogs can be limited by pursuing strong stability of both transmitter and receiver queues as follows:


Based on the Lyapunov optimization theory, the upper bound on the time-average queue length is also derived by using the algorithm which minimizes the Lyapunov drift [51] and finally the queuing delay and the chunk processing delay can be limited by achieving queue stability in (14). In this respect, many delay-constrained transmission policies which limit the queueing delay by pursuing the queue stability have been proposed in [52, 53]. In this paper, simulation results in Section 6 show that the queueing delay can be reduced by ensuring (14), i.e., strong stability of the queueing system.

Note that the average quality of the received chunks depends on the decisions at the transmitter and receiver sides both. Therefore, decision parameters, i.e., , , , , and are jointly optimized for pursuing the high video quality, limiting the queuing delay and the chunk processing delay, and saving the transmit power consumption and CPU usage. Among these performance metrics, we can observe a variety of tradeoffs. First of all, the high-quality chunks require small and large , which results in an increase of delays. The server can reduce the queuing delay while transmitting the high-quality images by consuming large transmit power. Similarly at the user side, the chunk processing time can be limited while pursuing high quality at the expense of large usage of CPU cores, i.e., .

Thus, decisions on , , , , and have to be carefully made depending on the current channel condition, and both transmitter and receiver states. We can imagine that a central controller gathers the network information from both transmitter and receiver and dynamically controls the video delivery and its quality enhancement. In general, the streaming service provider is deployed with the powerful server; therefore, the scenario in which the transmitter (i.e., the server) can observe the channel state information, deliver the desired chunks with appropriate quality, and let the receiver know how many depths of the ASRGAN and CPU cores are required is possible.

4 Joint Optimization of Dynamic Video Delivery and Quality Enhancement

This section introduces the joint optimization problem of dynamic image delivery and quality enhancement that pursues the high-quality video, the limited latency, and the efficient uses of transmit power and receiver CPU. Also, the Lyapunov-based decision method for solving the problem is presented.

4.1 Problem Formulation

As explained in Section 3, the proposed video delivery scheme jointly makes decisions on the number of transmitting chunks, the transcoding rate and the transmit power at the transmitter side, the number of depths of the ASRGAN, and the number of CPU cores at the receiver side in every time slot. We suppose that the perfect channel state information (CSI) is known at the central controller or the transmitter. After they observe their own queue and buffer states respectively, the decisions are made for pursuing the average image quality. The joint optimization problem is described as follows:

s.t. (16)

where is the maximum quality measure, is the threshold for the average transmit power, is the threshold for the average GPU usage, and is the power budget. Also, and are the sets of available depths of the ASRGAN and the available CPU cores, respectively. Since we adaptively choose the number of transmitting and receiving chunks, the objective function in (15) is the long-term time-averaged quality degradation of the received chunks. Also, , and , , and are defined in a similar manner. Specifically, the expectation of (15)–(19) is with respect to random channel realizations. The constraints of (16) and (17) are for limiting the queueing delay and the chunk processing delay. The transmit power consumption and usage of CPU cores are limited by the constraints of (18) and (19), respectively, and the constraint (20) comes from (1), which demonstrates that decisions on and depend on the channel capacity.

4.2 Min-Drift-Plus-Penalty Algorithm

The problem in (15)–(22) can be solved by the theory of Lyapunov optimization [51]. We first transform the inequality constraints of (18) and (19) into the forms of queue stability. Specifically, define the virtual queues and with the following update equations:


The strong stability of the virtual queues and push the average of and to be smaller than and , respectively.


be a concatenated vector of the actual and virtual queue backlogs. Define the quadratic Lyapunov function

as follows:


where , and are scaling coefficients for , , and , respectively. Then, let be a conditional quadratic Lyapunov drift on that is formulated as . According to Lyapunov optimization theory [51], if we suppose that the transmitter or the central controller could observe the current queue state , the dynamic policy achieving stability of the queues in (16)–(17) and (23)–(24) can be designed by minimizing an upper bound on drift-plus-penalty which is given by


where is a system parameter that gives a weight to the average video quality.

Here, the upper bound on the Lyapunov drift can be obtained as


where a constant is chosen to satisfy the following inequality:


Then, the upper bound on the conditional Lyapunov drift is given by


According to (26), minimizing a bound on drift-plus-penalty is consistent with minimizing


We now use the concept of opportunistically minimizing the expectations; therefore, (30) is minimized by the algorithm which observes the current queue state and chooses to minimize


Thus, we can reformulate the long-term problem of (15)–(22) into the opportunistic min-drift-plus-penalty problem at every slot as follows:


In (32)–(35), the dependency on the time slot is omitted for simplicity because the decisions are made independently at every time slot.

From (32)–(35), we can anticipate how the algorithm works. When the transmitter queue backlog is excessively long, many chunks are waiting to be delivered; therefore, the system can make the decisions on transmitting more chunks (i.e., large ) by compressing chunks with the large rate and/or consuming large power . In this case, when the receiver buffer is excessively large, the heavy computational tasks are accumulated so that the receiver could not operate the SR at the expense of quality degradation or it uses large CPU cores for operating the SR. On the other hand, when the receiver buffer is almost empty, it can deal with computational tasks for SR; therefore, the large depth of the ASRGAN is chosen to enhance the quality of the received chunks, and we can expect that the user can experience the high-quality streaming even with the small number of CPU cores. Meanwhile, if the transmitter queue backlog is short, it does not have to deliver the large number of chunks so that it can save its power and the transmitted chunks do not need to be transecoded with the high rate. It means that the quantity of the computational tasks to provide the high-quality streaming to the user is not excessively large.

System parameter in (31) is a weight factor for the term representing the measure of video quality degradation. The relative value of to remaining terms is important to control queue backlogs and quality measures at every time slot. The appropriate initial value of needs to be obtained experimentally because it depends on the channel environments, relationship between video quality and file size, constraints on performances (i.e., and ) and system coefficients , , . Also, should be satisfied. If , the user prefers low-quality videos even when the large number of chunks have already arrived at the user queue. Moreover, in the case of , the user only aims at accumulating queue backlogs without consideration of video quality. On the other hand, when , users do not consider the queue state, and thus they just request the highest-quality files. can be regarded as the parameter to control the trade-off between image quality and playback delay.

5 Adaptive Control Algorithm for Video Delivery and Computational Tasks for Quality Adaptation

This sections proposes the adaptive algorithm, which controls the video delivery of the transmitter and computational SR tasks of the receiver by solving the problem of (32)–(35). In addition, we analyze the buffering time required for the stable, smooth, and high-quality streaming system, and introduce the comparison techniques to fairly compare the results of the proposed scheme.

5.1 Adaptive Quality Control of Stable, Smooth, and High-Quality Streaming

Since the expected queue length is proportional to the average queuing delay according to Little’s theorem, if the quality of transmitting chunks and transmit power are given, it is advantageous for the transmitter to deliver as many chunks in its queue as possible. Therefore, the inequality constraint on the data rate in (33) can be converted into the equality constraint. Also, the constraint (33) gives the relationship among , , and ; therefore, if and are given, according to (33), the optimal power is obtained as


There are still many decision parameters to be made, i.e., , , , and ; therefore, we formulate the subproblem with respect to and by considering and as constants. If and are given, we can reformulate the problem of (32)–(35) as follows:


Here, we denote the optimal decision parameters of (37) by and without any constraints. Then, the objective function of (37) is convex with respect to both and , and the above problem in (37) can be easily solved by using Karush–Kuhn–Tucker(KKT) conditions. Thus, the following proposition provides the optimal solution of the problem in (37).

Proposition 1.

When and are given and follows (36), the optimal and have to satisfy the following equations:

  • If :

  • If :


Since and is a nonnegative integer, in Proposition 1, we have to compare boundary conditions depending on the value of . If , four boundary conditions given by possible combinations of and have to be compared. Meanwhile, if , we compare three boundary conditions as follows: 1) and , 2) and , and 3) and . Thus, if and are already given, then the optimal and are obtained by using Proposition 1 and comparing the above four boundary conditions. Then, in order to find the optimal solution of the problem in (32)–(35), we can greedily test all joint combinations of decisions on and and finally can obtain . The details are given in Algorithm 1.

  • : Time slots

  • : Time slot duration

  • : Transmission rate

  • : Threshold for consumption of average transmit power

  • : Threshold for usage of CPU cores

  • : Maximum queue length

  • : Lyapunov coefficient

3:Initialization: , , , , and .
4:for  do
5:     for  and  do
6:         Find and according to Proposition 1.
7:         Compare four boundary conditions (i.e., and ) and pick one of them minimizing (37).
9:         Compute by using (31).
10:         if  then
14:         end if
15:     end for
16:     Tx: Transcode video chunks with the rate .
17:     Tx: Transmit transcoded chunks with transmit power .
18:     Rx: Enhance the quality of the received images by using ASRGAN with depth and CPU cores.
24:end for
Algorithm 1 Adaptive Quality Control for Stable, Smooth, and High-Quality Streaming

5.2 Buffering Time Analysis

Owing to the data rate constraint of (20), the delivery of chunks at slot can be successfully completed within the slot duration. Since the user enjoys the chunks after the SR operation is completed, the buffer length becomes the delay time that the user experiences. For a representative example scenario, an online streaming service provides the user the desired chunks in sequence, and the buffering time is required for smooth playback at the beginning of the stream. When the buffering time is given, it can be said that the user does not experience the playback delay if . If we set the strict delay constraint given and , the following inequality has to be satisfied:


For the strict delay constraint , the inequality in (41) could be considered as the constraint of the problem in (37), and the problem is still convex, so it can also be solved by satisfying the KKT conditions.

Since and are integers, it is almost impossible to satisfy the equality condition of (41); therefore, the KKT conditions are the same as before, except for the addition of (41) so that Proposition 1 is still the solution if the inequality of (41) is satisfied. First, (38) satisfies the inequality of (41). Second, if we consider both (40) and (41) together, the maximum bound on is obtained as


Accordingly, if obtained from (39) is larger than , the following boundary conditions have to be compared:

  • If : 1) and , 2) and , and 3) and .

  • If : 1) and , and 2) and .

According to (42), the sufficiently long buffering time allows the user to receive many chunks every time. In this case, the user can receive many chunks in advance of the video playback during the buffering time . Meanwhile, when , the user worries about playback stall or streaming delays; therefore, the transmitter is better to deliver very small number of chunks and the user would not enhance the quality of the received chunks (i.e., ).

5.3 Comparison Techniques

Our proposed dynamic video delivery algorithm (i.e., Algorithm 1) has two important features: 1) coordination between the transmitter and the receiver, and 2) adaptive depth control of the ASRGAN. In order to verify the advantages of the proposed scheme, two comparison techniques having only one of the above features are introduced in this section.

5.3.1 Comp1: Separate Optimization at Transmitter and Receiver Sides

This comparison scheme does not allow coordination between the transmitter and the receiver so that decision parameters are not jointly derived. Since the Lyapunov optimization process in Section 4.2 is developed with the assumption that decisions are jointly made by allowing the cooperation between the transmitter and the receiver, the optimization problem should be differently formulated for being separately optimized at transmitter and receiver sides. At the transmitter side, , , and are determined from the following problem:

s.t. (43)

Similar to the Lyapunov optimization process in Section 4.2, the min-drift-plus-penalty algorithm at the transmitter side can be derived as

s.t. (44)

Here, the transmitter greedily finds the optimal , , and that minimizes the objective function of the problem in (44). On the other hand, the receiver side makes the optimal decisions on and by using the following problem:

s.t. (45)

and it can be converted to the following min-drift-plus-penalty algorithm:

s.t. (46)

Similarly, we also assume that the receiver greedily finds the optimal solution (i.e., and ) of the problem in (46).

5.3.2 Comp2: Non-Adaptive Super-Resolution Network

This comparison scheme does not employ the ASRGAN; therefore, the SR network is optimized for the fixed depth number and the receiver cannot control computational tasks of the SR network depending on the quality of the received chunks and the buffer state. We assume that the maximum depth of the SR network is always used; therefore, ‘Comp2’ solves the identical problem of (32)–(35) but . Note that it is not the simplified version of ‘Comp1’ because usage of CPU cores is still jointly determined with the transmitter decisions (i.e., , , and ) and the SR network of ‘Comp2’ is different from that of ‘Comp2’. Since ‘Comp1’ needs to adaptively control the number of the depths of the neural network, its SR network is optimized for multiple outputs from the ends of different depths; however, the neural network of ‘Comp2’ is optimized for the fixed number of depths only.

Fig. 2: The transition of average PSNR and average SSIM by training.
Bicubic Depth 5 Depth 10 Depth 15 Depth 20 Depth 25 Bicubic Depth 5 Depth 10 Depth 15 Depth 20 Depth 25