I Introduction
360-degree video technology has recently become more and more popular with the increasing demands on interactive applications. By wearing a head-mounted display (HMD), 360-degree video users can freely move their heads to change the viewing directions, which provides an immersive viewing experience. To improve the quality of experience (QoE) [1], most 360-degree videos have 6K or even higher resolution. Streaming such high-resolution videos is non-trivial because of the limited bandwidth of wireless communication channels [2]. In addition, there has been a recent trend toward high-quality 360-degree video content creation using 3D panoramic VR cameras. Compared to the traditional live streaming, 360-degree video live streaming is considerably more challenging due to its panoramic nature [3], and it has more stringent QoE requirements to prevent motion sickness.
Many published works have investigated 360-degree video streaming [4, 5, 6, 7, 8]. For example, the authors of [5]
designed a rate adaptation algorithm that can maximize the defined QoE metrics for 360-degree video streaming. Given the FOV and bandwidth estimation, Xie et al.
[6] proposed a probabilistic tile-based adaptive 360-degree video streaming system, named 360ProbDASH, which combined viewport adaptation and rate adaptation to solve the QoE-driven optimization problem. In [7], the authors presented Vortex, a live VR video streaming system that works in a computationally and bandwidth-efficient manner.In 360-degree video live streaming applications, a capturing device is used to record scene and transmit the captured video to the server in real time. Then, multiple users can access and watch the 360-degree video with their HMDs by downloading the coresponding video content from the server. Obviously, under a limited bandwidth constraint, the uplink (VR cameras to video server) and downlink rate (video server to end users) should be carefully selected to improve the viewing experiences.
In this paper, we formulate this transmission problem as a nonlinear integer programming problem. To solve this, we propose an optimal algorithm that combines the KKT condition and the branch and bound method. Extensive simulations are conducted based on the real-world LTE network traces, and the simulation results show that the proposed solution can significantly improve users’ QoE in comparison with other baseline schemes. To the best of our knowledge, this is the first research jointly considering rate adaptation for coupled uplink and downlink in 360-degree video streaming system.
Ii System Overview
In this section, we present our 360-degree video live streaming system, which jointly allocates uplink and downlink wireless resources to maximize the overall QoE.
Ii-a System Model
We consider a scenario where multiple users are watching 360-degree video live streaming, as illustrated in Fig. 1. cameras are installed in the 360-degree video capturing device, and each camera can record high-quality 2D video. Constrained by the limited wireless uplink bandwidth, the recorded videos may not be transmitted to video server with the highest quality. Assuming that each camera is capable of encoding the original video into several versions with different bitrates, the video with the most appropriate bitrate should be selected and uploaded to the video server. Video is partitioned into tiles at the server and the corresponding tiles for user’s FoV are transmitted to the user from the server. The downlink transmission is also limited by the bandwidth. Then, one representation is selected for every tile and transmitted to the user, according to the corresponding FOV and channel status under the assumption that such information is transmitted to the video server in real time. Obviously, for each tile, the best quality received by a user cannot exceed the quality of the video uploaded by the corresponding camera, where the original video is captured. Thus, this scenario can be modeled as a typical coupled video transmission problem.
In this paper, we consider this coupled uplink and downlink transmission system to maximize the QoE of all users through rate adaptation. The architecture is shown in Fig. 2. The system consists of four parts: a 360-degree video capturing device with cameras, processing modules on both the server side and the client, and the end users wearing HMDs. Each camera can encode the captured video into different video qualities, expressed as different constant bitrates, but only one video bitrate can be uploaded to the video server.
The server side includes two modules: the QoE-driven uplink processing module and the downlink processing module. The uplink processing module is responsible for obtaining a raw 360-degree video by stitching the uploaded videos. The downlink processing module selects the appropriate video quality for each tile based on the FOVs and channel information from all the users. The system workflow is as follows: the uplink processing module selects a bitrate level for each camera. Then, the videos with different qualities are stitched and projected into a panoramic video. Next, the downlink processing module divides the raw panoramic video into tiles and encodes each tile into different quality representations. With the help of feedback information from the client side, the downlink processing module can determine the range of tiles to be transmitted and select the appropriate quality level for each tile. Finally, all the selected tiles are transmitted through the downlink channel to maximize the users’ QoE.
At the client side, as the tiles are received from the server, they are decoded, projected, rendered and displayed in the HMD. In addition, the client side sends the user’s FOV and channel information in real time to the video server through each user’s uplink feedback channel.
Ii-B Problem Formulation
Suppose the system includes users (indexed by ). The evenly deployed cameras form a 360-degree video capturing system. Each camera can record and generate videos at up to bitrate levels (indexed by ). The bitrate of camera uploaded with bitrate level is denoted by . The total bandwidth of the uplink channel is . When the videos generated by the cameras are uploaded to the server, the server processes them to produce a 360-degree video. Prior to downlink transmission, the 360-degree video is divided into tiles (indexed by ), and each tile is encoded into different representations (indexed by ) at different quality levels, which is the same as how the HTTP DASH processes the video. We denote the bitrates of tile with representation in GOP by . As aforementioned, the quality of each video in the downlink cannot exceed the quality of the corresponding video in the uplink transmission. Assume that the bandwidth of the k-th GOP in the downlink channel is . For user , denotes the tiles covered by his FOV. Then user ’s expected QoE can be defined as follows:
(1) |
where denotes the bitrate of tile with bitrate level in the -th GOP, and means “downlink” channel.
is a binary variable, which equals 1 if tile
is transmitted with bitrate level in the -th GOP, and 0 otherwise. Function is a mapping function, which maps the bitrate of tile to the quality perceived by the user. The form of function is the logarithmic of the received bitrate. The second term of this equation is used to show the impact of stalls. We assume stall will occur when the bandwidth for the -th GOP is less than the video bitrate, and stall time is approximately equal to the duration of the -th GOP ^{1}^{1}1In a practical system, due to the existence of the playback buffer, stalls are related to the transmission rate, the playback speed and buffer status. Once the buffer drains, a stall will occur. In this manuscript, to simplify the formulation, we assume when the bandwidth value in a GOP is less than the video bitrate, stall will occur. How to more previously model this is left as the future work.. is an indicator function, and its value is 1 only when , otherwise 0. This means that in one GOP, when the video bitrate is greater than the bandwidth, the stalls will occur. is the duration of the -th GOP. is the bandwidth when transmitting the -th GOP. Thus we can use the average of the bandwidth during as the value of . In addition, considers the quality switches between the consequent GOPs. Similarly, denotes the bitrate of tile with bitrate level in the -th GOP. is also a binary variable, and equals 1 if tile is transmitted with bitrate level in the -th GOP, and 0 otherwise. Finally, constants and are the non-negative weight parameters to balance the three factors. With this QoE model, our uplink and downlink optimization problem is defined as follows:problem 1:
(2) |
s.t.
(3) |
(4) |
(5) |
(6) |
(7) |
where and are the optimization variables. stands for ”uplink”. is a binary variable, which equals 1 when the video from camera is transmitted with bitrate level and 0 otherwise. is also binary variable (same as in equation (1)), which equals 1 if tile is transmitted with representation in -th GOP and 0 otherwise. denotes the bitrate of camera uploaded at bitrate level . Constraints (3)-(4) apply to the uplink. Constraint (3) indicates that the video of camera can be uploaded with only one quality level. The total bitrate of all the uploaded videos cannot exceed the total bandwidth of the uplink as specified in Constraint (4). Constraints (5)-(6) are the downlink constraints. Constraint (5) ensures that only one representation can be selected for tile in k-th GOP, which is transmitted to the client side. Constraint (6) ensures that the sum of the tile bitrates cannot exceed the total bandwidth of the downlink channel. Constraint (7) discusses the coupled uplink and downlink and ensures that the quality levels of tiles in the downlink cannot exceed the quality level of the videos generated by the corresponding camera and transmitted in the uplink.
Iii QoE driven Optimal Rate Adaptation Algorithm
In this section, we introduce the optimal solution of the above problem, which is a nonlinear integer programming problem and can be proven to be NP-hard. We firstly approximate the indicator function in the QoE model with a logarithmic function, so that the QoE function becomes a continuous function. Then, because the constraint in problem 1 satisfies the linear constraint qualification, we can use the KKT condition to solve the relaxation problem of problem 1. By using the logarithmic function to approximate the indicator function in the QoE model and relaxing the integer variables and to continuous variables, the relaxed problem can be solved by applying the KKT conditions and the Lagrangian function. Then, we can obtain the optimal value of the original problem.
Iii-a KKT condition for the relaxed continuous problem
First, we approximate the indicator function in the QoE model with a logarithmic function and relax and to continuous variables. Then, the original problem 1 can be solved by KKT conditions. The Lagrangian function of the problem is as follows Eq. (8):
(8) |
where
(9) |
(10) |
(11) |
(12) |
(13) |
Thus, we can obtain the relevant KKT conditions:
(14) |
(15) |
(16) |
(17) |
(18) |
(19) |
By solving equations (14)-(19), which are associated with the KKT condition, we can derive the optimal solution of the relaxed nonlinear problem. Next, we use the branch and bound method to find the solution of the original binary programming problem.
Iii-B Branch and bound Method
The branch and bound method designed to solve problem 1, as shown in Table I. The initial inputs are and , where is the solution to the corresponding relaxation problem solved by the KKT condition, and indicates the corresponding optimal objective function value. The outputs are the 0–1 variable solution and the corresponding optimal objective function value .
Input: The optimal solution of the relaxation problem , |
the optimal objective function value of the relaxation problem , |
and a random value in the range (0,1) . |
Output: The optimal solution of problem 1 , |
and the optimal objective function value of problem 1 . |
Initial: , , |
Choose any solution that does not meet the 0–1 constraints from , |
. |
IF |
Add the constraint to Problem P-1 to form subproblem I. |
ELSE |
Add the constraint to Problem P-1 to form subproblem II. |
END IF |
k++,find the solutions to the relaxation problems in subproblems I and II |
(denoted as ) where the optimal objective function value is . |
Find the maximum value of the optimal objective function and use it as a |
new upper bound. Update , |
Then, find the maximum value of the objective function from the branch that meets |
the 0-1 condition as a new lower bound, and update , . |
IF |
Cut off the bound |
ELSE IF and |
Go to step 2 |
ELSE The optimal solution of problem 1 has been found, |
and |
Iv Performance Evaluation
We conduct experiments to verify the performance of the proposed QoE-driven 360-degree video live streaming system.
Iv-a Simulation Setup
During the simulation, we select 6 videos captured by an Insta 360 Pro2 panoramic camera as the original video at the uplink. The highest resolution of each video is . The video duration is 35s, and the frame rate is 30 fps. Figure. 3 (a) shows a snapshot of the six original videos.
Then, we use the high efficiency video coding (HEVC) method and constant bitrate (CBR) as a bitrate control technique to compress each original video, with the bitrate representations of each video being {1.5, 2, 2.5, 3} Mbps. We use AVP software (Kolor Autopano Video Pro) to complete the synthesis and stitch the VR videos together. Subsequently, the panoramic VR video is divided into 16 tiles, each tile covers degrees. We assume that the video server can provide 4 different bitrate representations for each tile at constant bitrates {0.2, 0.6, 1, 1.4} Mbps. Figure. 3(b) shows the panoramic VR video synthesized from the 6 original videos, which is then divided into 16 tiles. At the client side, we assume that there are 100 users wearing HTC Vive as the HDM device. The users are located at random distances from the base station, and they can request and watch 360-degree videos. The channel information can be calculated based on the fixed transmission power of the base station. During the simulation, we assume that each users’ FOV ( degrees) is randomly distributed in the 360-degree video and their FOV information is transmitted to the video server in real time.
Iv-B Simulation Results
To better verify the performance of the proposed scheme and make the simulation more realistic, we have conducted simulations using the real-world network traces [9] with/without the perfect knowledge of future network conditions (i.e. bandwidth prediction is 100% correct or partially correct.). We use two baseline schemes to show the advantages of the proposed scheme. In the first method (denoted as Algorithm 2), the uplink bandwidth resources are evenly distributed among the different cameras, and the downlink uses the same adaptive allocation algorithm as our method. In the second method (denoted as Algorithm 3), uplink bandwidth resources are evenly distributed among the different cameras, and downlink bandwidth resources are equally distributed to different tiles. We first consider the case with perfect knowledge of future network condition and use the bandwidth value of each GOP as input for simulation. Then we consider the case with imperfect knowledge of future network conditions. According to the state-of-the-art bandwidth prediction scheme [10]
, we add a Gaussian random noise, which has mean of 0 and variance of 1, to the LTE trace (with multiplying 30% ) to simulate the prediction error. We then perform simulation experiments using the predicted bandwidth (which differs from the real bandwidth) as the input for the optimization, and real the correct bandwidth to calculate the value of the objective function.
The simulation results of both cases are shown in Table II. From the simulation results, we can observe that the proposed algorithm can achieve a higher QoE value with all the three LTE traces. This is because we not only achieve the integration of uplink and downlink resource allocation but also efficiently allocate the resources. We can also observe that the larger bandwidth results in larger QoE value by comparing the performances with different traces. The QoE value drops due to the prediction error. This is because if the predicted bandwidth value after adding Gaussian noise is larger than the real bandwidth value, the real bandwidth may not be able to satisfy the bitrate requirement, which will cause the suspension of stalling and quality switching. If the predicted bandwidth value is less than the real bandwidth value, some bandwidth may be wasted, which will degrade the received video quality. However, the proposed algorithm still works better than the baseline scheme due to the joint consideration of uplink and downlink rate allocation.
Network Trace | Performances | ||
---|---|---|---|
Algorithm 1 | Algorithm 2 | Algorithm 3 | |
Bicycle Trace | 8.9057 | 7.0652 | 6.2781 |
Predicted Bicycle Trace | 8.0236 | 6.2537 | 5.7966 |
Car Trace | 6.3325 | 5.2933 | 4.7337 |
Predicted Car Trace | 5.0321 | 4.1097 | 3.5282 |
Bus Trace | 5.0026 | 4.0219 | 3.5537 |
Predicted Bus Trace | 3.9803 | 3.1102 | 2.2576 |
V Conclusions
In this paper, we proposed a multi-user QoE-driven 360-degree video live streaming system, which jointly considers the uplink and downlink transmissions. In our system, the server selects the optimal bitrate settings for both the uplink and downlink channels based on the bandwidth information and the users’ real-time FOV to maximize the QoE value of all users. To achieve this, we proposed an algorithm that combined the KKT condition and branch and bound method to solve the defined rate adaptation problem. Finally, the simulation results based on the real-world network traces demonstrated that our proposed algorithm outperformed other baseline schemes.
References
- [1] J. Li, R. Feng, Z. Liu, W. Sun, and Q. Li, “Modeling qoe of virtual reality video transmission over wireless networks,” in 2018 IEEE Global Communications Conference (GLOBECOM), Dec 2018.
- [2] Z. Liu, S. Ishihara, Y. Cui, Y. Ji, and Y. Tanaka, “Jet: Joint source and channel coding for error resilient virtual reality video wireless transmission,” Signal Processing, vol. 147, pp. 154–162, 2018.
- [3] X. Liu, B. Han, F. Qian, and M. Varvello, “Lime: Understanding commercial 360 degree live video streaming services,” in Proceedings of the 10th ACM Multimedia Systems Conference, ser. MMSys’19. New York, NY, USA: ACM, 2019, pp. 154–164.
- [4] X. Corbillon, A. Devlic, G. Simon, and J. Chakareski, “Optimal set of 360-degree videos for viewport-adaptive streaming,” in Proceedings of the 25th ACM International Conference on Multimedia, ser. MM ’17. New York, NY, USA: ACM, 2017, pp. 943–951.
- [5] A. Ghosh, V. Aggarwal, and F. Qian, “A Rate Adaptation Algorithm for Tile-based 360-degree Video Streaming,” arXiv e-prints, p. arXiv:1704.08215, Apr 2017.
- [6] L. Xie, Z. Xu, Y. Ban, X. Zhang, and Z. Guo, “360probdash: Improving qoe of 360 video streaming using tile-based http adaptive streaming,” in Proceedings of the 25th ACM International Conference on Multimedia, ser. MM ’17. New York, NY, USA: ACM, 2017, pp. 315–323.
- [7] R. Konrad, D. G. Dansereau, A. Masood, and G. Wetzstein, “Spinvr: Towards live-streaming 3d virtual reality video,” ACM Trans. Graph., vol. 36, no. 6, pp. 209:1–209:12, Nov. 2017.
- [8] C. Guo, Y. Cui, and Z. Liu, “Optimal multicast of tiled 360 vr video,” IEEE Wireless Communications Letters, vol. 8, no. 1, pp. 145–148, 2018.
- [9] J. Van Der Hooft, S. Petrangeli, T. Wauters, R. Huysegems, P. R. Alface, T. Bostoen, and F. De Turck, “Http/2-based adaptive streaming of hevc video over 4g/lte networks,” IEEE Communications Letters, vol. 20, no. 11, pp. 2177–2180, 2016.
- [10] Ningning Hu and P. Steenkiste, “Evaluation and characterization of available bandwidth probing techniques,” IEEE Journal on Selected Areas in Communications, vol. 21, no. 6, pp. 879–894, Aug 2003.