A QoE Model in Point Cloud Video Streaming

by   Jie Li, et al.

Point cloud video has been widely used by augmented reality (AR) and virtual reality (VR) applications as it allows users to have an immersive experience of six degrees of freedom (6DoFs). Yet there is still a lack of research on quality of experience (QoE) model of point cloud video streaming, which cannot provide optimization metric for streaming systems. Besides, position and color information contained in each pixel of point cloud video, and viewport distance effect caused by 6DoFs viewing procedure make the traditional objective quality evaluation metric cannot be directly used in point cloud video streaming system. In this paper we first analyze the subjective and objective factors related to QoE model. Then an experimental system to simulate point cloud video streaming is setup and detailed subjective quality evaluation experiments are carried out. Based on collected mean opinion score (MOS) data, we propose a QoE model for point cloud video streaming. We also verify the model by actual subjective scoring, and the results show that the proposed QoE model can accurately reflect users' visual perception. We also make the experimental database public to promote the QoE research of point cloud video streaming.



There are no comments yet.


page 8

page 9


Joint Communication and Computational Resource Allocation for QoE-driven Point Cloud Video Streaming

Point cloud video is the most popular representation of hologram, which ...

A Survey on 360-Degree Video: Coding, Quality of Experience and Streaming

The commercialization of Virtual Reality (VR) headsets has made immersiv...

Adaptive Rate Allocation for View-Aware Point-Cloud Streaming

In the context of view-dependent point-cloud streaming in a scene, our r...

Points2Sound: From mono to binaural audio using 3D point cloud scenes

Binaural sound that matches the visual counterpart is crucial to bring m...

What the HoloLens Maps Is Your Workspace: Fast Mapping and Set-up of Robot Cells via Head Mounted Displays and Augmented Reality

Classical methods of modelling and mapping robot work cells are time con...

A Knowledge-Driven Quality-of-Experience Model for Adaptive Streaming Videos

The fundamental conflict between the enormous space of adaptive streamin...

Dynamic Adaptive Point Cloud Streaming

High-quality point clouds have recently gained interest as an emerging f...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

In recent years, volumetric video has become more and more popular, and there are many augmented reality (AR) and virtual reality (VR) applications in education, entertainment, medicine and other industries [18] . Compared with 360-degree video, volumetric video can provide six degrees of freedom (6DoFs) immersive experience to achieve extraordinary viewing procedure.

Currently, the most popular data format of volumetric video is the point cloud [2] [14] due to its flexibility and simplicity111In this paper we use term ”volumetric video” and ”point cloud video” interchangeably.. Point cloud is composed of quantities of points in three-dimensional space, and each point contains space coordinates and RGB attributes. Due to having more multidimensional information than regular video (including 360-degree video), the bandwidth requirement for transmitting point cloud video is extremely large. Taking a commonly used 30 frames per second (fps) video as an example, when the number of points is approximately 760,000 per frame, the volume of data without any compression is approximately 12 MB, which means the point cloud video streaming bandwidth demand is up to 2.9 Gbps [11] . This requirement will be even higher when delivering high-quality video.

Adaptive streaming is one of the basic topics for many related applications. Similar to the Dynamic Adaptive Streaming over HTTP (DASH) used in traditional VR videos [24] [8] [26] , point cloud video can be divided into 3D tiles and compressed into different quality representations, which are then uploaded to the video server [19] . The client selects which tiles to transmit and at which quality representation under the premise of satisfying current bandwidth condition, according to current network analysis, user’s FoV prediction and current buffer status of playback device. Finally, according to selected tile quality representation, the client can obtain the corresponding tiles in the user’s FoV from the server, integrate the tiles into a point cloud video, and render the point cloud video to enjoy.

Point cloud video can be compressed to further reduce bandwidth demand. For example, MPEG Point Cloud Compression (PCC) codec can achieve a fairly high compression ratio [21] . Thus the network bandwidth consumption can be greatly reduced during transmission. However, decompression of point cloud video will consume computational resources and relatively long decoding time. For example, a 511 MB volumetric video called [3] containing 30 raw frames can be compressed into a binary file of only 21 KB using the MPEG-VPCC-TMC2 v12.0 codec, but it takes approximately 34 seconds to decode222The experiments are carried out on a PC, with the Windows 10 64-bit operating system, installed an i5-7500 CPU, 16 GB RAM and one RTX2060 GPU..

A point cloud video streaming system should be optimized to maximize users’ perceived quality of experience (QoE). Yet there is still a lack of research on the QoE model for point cloud video streaming. QoE models are generally divided into objective metric and subjective metric. The former is usually measured by objective quality metric such as Peak Signal to Noise Ratio (PSNR). The latter generally conducts user subjective experiments, then statistics the scores reflecting user’s direct viewing experience, and finally obtains a QoE model through mean opinion score (MOS).

The QoE model of point cloud video streaming is different from traditional system [5] . First of all, the objective PSNR of traditional video usually only represents color information of the video [27] , and cannot represent position information of pixels in point cloud video. So the traditional PSNR cannot be directly used to measure the quality of point cloud video, especially the number and positions of points will change after compression. Secondly, subjective QoE model usually includes stall and quality switch due to network bandwidth variations in video streaming. Different from traditional video, the original frame of point cloud video can be compressed to reduce transmission time, but it takes a long time to decode. Thirdly, commonly used 3D tiling scheme in point cloud video streaming will cause the viewport distance effect. That is the user’s perception of quality switch caused by faraway tiles is weaker than that caused by nearby ones. Therefore, the distance between different tiles to the user’s viewpoint affect the quality switch and user’s quality of visual perception, while traditional video does not.

In this paper, to establish a more comprehensive QoE model for point cloud video streaming, we explore the influencing factors related to QoE metric, which mainly include objective quality of point cloud video, stall and quality switch during the video streaming. For objective quality, we extend PSNR to point cloud video, named Point Cloud PSNR (PCPSNR), by integrating position and color information of all pixels. For the playback stall, we consider the transmission time and decoding time changes caused by point cloud compression and decompression to measure the stall time more accurately. For the quality switch, we also consider the viewport distance effect to better reflect the effect of the distance from tile to user’s viewpoint on quality switch.

We set up an experimental system to simulate the point cloud video streaming procedure. Firstly, the point cloud video is divided into 3D tiles, compressed into different quality representations. Secondly, the client performs quality selection based on the field of view (FoV) prediction and network bandwidth, and obtains the quality representation of the tiles in the user’s FoV. After that, the point cloud video can be rendered to watch, while the stall time and quality switch can be calculated. Finally, the point cloud video is played with a self-developed player based on Unity, which is the most widely used 3D development platform [6] , and user subjective experiments are conducted. We collect the subjective MOS data, analyze the relationship between MOS and related influencing factors, and establish a QoE model of point cloud video streaming.

As far as we know, this is the first work to study the QoE model for point cloud video streaming. The contributions of this article are summarized as follows:

  1. []

  2. We analyze the influencing factors related to QoE of PCC video streaming, which mainly include the objective quality of the point cloud video, stall and quality switch during streaming.

  3. We propose a new metric, PCPSNR, to evaluate the objective quality of the 6DoFs point cloud video, which can reflect the position information and color information of points simultaneously.

  4. We conduct detailed subjective experiments on the established experimental system. By analyzing the experimental data, an accurate QoE model for point cloud video streaming is obtained.

  5. We made public the subjective experimental MOS data, which may stimulate more research on this area.

The rest of this article is organized as follows. Section II introduces the point cloud video streaming system and its quality evaluation metric. Section III introduces in detail the factors affecting the QoE metric of volumetric video streaming. Section IV describes our experimental system. The experiment procedure and results are discussed in Section V and Section VI, respectively. Finally, we summarize the whole article in Section VII.

Ii related work

In this section, we first introduce volumetric video compression and streaming. Then we discuss objective quality metrics and subjective quality evaluation of point clouds. Finally, we introduce the related work of QoE modeling.

Ii-a Point Cloud Video Compression and Streaming

Video is usually first compressed before the transmission. Mekuria [16] developed an inter-prediction algorithm for real-time compression of 3D tele-immersive video, and proposed a general point cloud codec. Another common compression method is V-PCC recommended by MPEG [21] , which first converts the dynamic point cloud into two independent volumetric video sequences, and then uses an encoder to compress the geometric and textural data of the point cloud video.

After compression, a volumetric video can be transmitted and the corresponding 6DoFs scene can be rendered for the user to view. Park [20] proposed a transmission system for volumetric video streaming based on 3D tiles, and introduced playback buffer to reduce latency. Han [7] designed a streaming system called ViVo to deliver high-quality volumetric content to commercial mobile devices. The authors proposed three visibility-aware optimizations in ViVo to effectively reduce mobile resource overhead, and studied point cloud encoding, decoding, segmentation and user’s FoV in detail. Li [12] proposed a volumetric video streaming framework for joint communication and computational resources allocation to maximize the defined QoE based on the user’s FoV and the point cloud video features. These tiles have different quality levels according to the user’s FoV, thus saving bandwidth by paying attention to what the user is watching.

Ii-B Objective and Subjective Quality Evaluation for Point Cloud Video

The quality of point cloud is generally expressed by objective quality metric. Mekuria [15] proposed two kinds of quality evaluation metrics by comparing original and compressed point cloud, which are obtained by calculating point-to-point distance and PSNR per YUV component in the YUV color space. Tian [22] proposed a point-to-plane metric used to calculate the quality of compressed point clouds. This metric is calculated by projecting the point-to-point distance onto the normal of the original point cloud plane where the corresponding point is located. Viola [25] proposed a color-based metric to represent the objective quality of point cloud by extracting color features, and then combined geomety- and color-based methods to create a rendering-independent metric to better capture point cloud distortion. Due to lack of unified metric that considers position and color information, a new objective metric that can comprehensively represent the quality of point cloud is needed.

Subjective quality evaluation of point cloud is mainly divided into static point cloud and dynamic point cloud. Zerman [28] conducted a subjective and objective evaluation on the TMC2 PCC software of volumetric video. After evaluation, volumetric video quality database and subjective data are publicly available. Hooft [23] studied the QoE of the point cloud in the case of 6DoFs adaptive streaming and considered changes in different network conditions and configurations. In the above studies, the user’s QoE was finally compared with objective quality metrics, which ignored the effect of network changes on the user’s QoE in volumetric video streaming. Therefore, a more specific QoE model is needed to more accurately reflect the user’s QoE.

Ii-C QoE Modeling

For QoE modeling, there have been many studies in traditional video and VR video. Liu [13] proposed a new user experience model to measure the user experience of DASH video, combining three factors that affect the user experience: initial delay, stall and level change, to form an overall user experience model. Filho [4]

proposed PERCEPTION, which is used to evaluate the user’s QoE in adaptive VR video streaming. Through machine learning, the playback performance of VR video is predicted, and then the playback performance is used to model and predict the user’ QoE. Li

[10] studied the QoE modeling of VR video streaming based on wireless network. After considering the tiling and user’ FoV changes of VR video streaming, a QoE model of panoramic video wireless streaming was obtained through the analysis of experimental data. The above researches on QoE modeling mainly focus on traditional video and VR video, while there are few researches on volumetric video streaming.

Iii QoE Model for Point Cloud Video Streaming

In this section, we discuss several factors related to QoE metric for point cloud video streaming.

Iii-a Overview

In point cloud video streaming, objective quality has the greatest impact on QoE, as it can directly reflect the video quality received by users. We propose PCPSNR to better reflect the video quality of point cloud video, by integrating the position and color information of all points. In addition, due to the huge amount of point cloud data, current volumetric video streaming systems generally compress the video first before transmission, and then the clients can decode, render and play on their devices. However, the transmission time affected by different network bandwidths, as well as decoding time affected by different computational resources and decoder may cause playback to stall, which has a negative impact on QoE [12] , [11] . Finally, due to the adaptive mechanism of streaming system, users will see video contents of different quality representations, the quality switch of video content seen by users will also impair QoE [10] . Based on the above analysis, we conclude that the factors affecting the QoE metric of volumetric video streaming are the objective quality of point cloud video represented by PCPSNR, stall and quality switch caused by streaming system, as shown in Fig. 1.

Fig. 1: Influencing factors related to QoE metric in adaptive point cloud video streaming.

Iii-B Objective Quality

To obtain PCPSNR, we first calculate the distance quality metric based on the point-to-plane distance, then integrate original PSNR of each YUV components into the color quality metric , and finally combine the distance metric and color metric.

Iii-B1 Position information of points

For the calculation of distance quality metric representing the position information of points, we use the point-to-plane distance [22] . As shown in Fig. 2 , for each point in the received point cloud , the nearest neighbouring point in the original reference point cloud

can be found, and then the normal vector is estimated through the r-radius neighborhood, i.e., the normal vector of the plane where the nearest neighbor is located

[9] . The final point-to-plane (P2Plane) distance is obtained by projecting the point-to-point distance in the direction of the normal vector. In this paper, we take the root mean square distance (RMSD) to calculate the P2Plane distance of the whole point cloud. Specifically, the mean distance is obtained by averaging the P2Plane distance between all the points in the received point cloud (the number of points is ) and the nearest neighbor in the reference point cloud :


where is the P2Plane distance between point in the received point cloud and the nearest neighbor in the reference point cloud .

Fig. 2: Point-to-point distance, point-to-plane distance and , , .

To get a symmetric point cloud distance, we calculate the P2Plane RMSD from to and to , and the maximum value is used as the final P2Plane distance:


where and are RMSD of P2Plane distance between and , and respectively.

Then the distance quality metric of point cloud can be expressed as:


where is the maximum diagonal length of the Bounding-Box corresponding to each frame of the point cloud video.

Iii-B2 Color information of points

For the calculation of color quality metric representing the color information of points, we convert the RGB value contained in each point to YUV format according to ITU-R Rec.Bt.709 formula [1] . As shown in Fig. 2, the YUV mean square error (MSE) of point in the received point cloud video and its nearest neighbor in the reference point cloud is calculated:


where , , and are the Y, U, V value of point in , respectively, represents the number of points in .

Similarly, we calculate the MSE from to and to , and then take their maximum as the symmetric MSE to calculate the :


where represents the maximum value of the color information, which is equal to 255 in our case. and are calculated in the same way as . According to [17] , the color quality metric of point cloud can be expressed as:


Iii-B3 PCPSNR Metric

Based on position and color information of points, we can define PCPSNR metric as follows:


It is worth noting that the proposed PCPSNR metric is also applicable to traditional 2D video. In that scenario, and PCPSNR will degenerate to , complying with quality metric definition of traditional 2D video.

Iii-C Stall Impairment

The stall during the user’s playback is a factor negatively correlated with QoE. In traditional video streaming systems, when network conditions degrade, the required transmission time may become longer and playback will stall if the playback buffer is empty. In point cloud video streaming system, due to the large amount of data in each frame, the server usually compress the video to improve the transmission efficiency in addition to producing videos with different quality representations. But this introduces extra decoding time in the client side, which is significantly longer than that required in traditional video decoding. Therefore, in the point cloud video streaming system, stall impairment is mainly caused by both downloading time and decoding time.

Iii-D Quality Switch Impairment

Similar to the stall, quality switch is another factor negatively correlated with QoE in point cloud video streaming. During playback, the client requests 3D tiles of different quality representations to adapt to time-varying network bandwidth. Therefore, users may experience a quality switch during the viewing process, which will cause QoE impairment. In addition, point cloud video has the characteristic of 6DoFs, which will bring the viewport distance effect. As illustrated in Fig. 3, since the blue tile is closer to the viewpoint than the red tile, it is obvious that the quality switch caused by the blue tiles is more perceptible to the user. This effect is called the viewport distance effect, which is caused by the different viewport positions of the 3D tiles. The viewport distance effect should be introduced in the measurement of quality switch impairment.

Fig. 3: Viewport distance effect: since the viewport position of the blue tile is closer to the viewpoint than the red tile, the quality switch of blue tiles will be more easily perceptible.

To sum up, as shown in Fig. 1, we can model QoE metric with PCPSNR, stall and quality switch of point cloud video during streaming transmission. That is , where is the metric mapping function, represents the stall duration, represents the quality switch times. In the following section, we will study the detailed QoE metric based on our adaptive point cloud streaming simulation system and subjective experiments.

Iv Experiment setup

Iv-a Experimental System

In this section, we attempt to construct an adaptive point cloud video streaming experimental system, then conduct subjective experiments to analyze the relationship between QoE and the influencing factors, and finally establish a QoE model.

A widely used streaming system is shown in Fig. 4, which consists of a client and a server. In particular, the point cloud video is segmented into 3D tiles and each tile is encoded into compressed tiles with different quality representations. The client conducts network bandwidth prediction to determine the current network environment and receives the user’s FoV information. After FoV analysis, the system can transmit only the tiles in the user’s FoV to avoid unnecessary transmission, or the system can transmit a larger area to prevent prediction errors. The client also needs to monitor the current buffer state. After combining the above three information on the client side, the quality selection process is started to determine the optimal quality representation of tiles in the user’s FoV. The results of the tile quality selection are uploaded to the server through the HTTP interface. Finally, the server sends the selected tiles to the client through the HTTP interface. The client decodes the received tiles, merges the tiles, and then renders the video for the user to view.

Fig. 4: Adaptive point cloud video streaming system architecture.

As shown in Fig. 3, FoV analysis can be used to enable the transmission of only the point cloud video in the user’s field of view, which can reduce overhead of computational and communication resources. In actual system, FoV information can be obtained from user’s Head-Mounted Displays (HMD) in real-time, or it can be predicted based on the user’s most recent viewpoint information. Therefore, we simulate these two FoV analysis methods in the following sections.

For a point cloud video, it is usually divided into Group of Frames (GoFs) in time dimension and 3D tiles of the same size in space dimension. For each tile, it can be assumed there are

kinds of quality representations. Define a binary variable

, where means tiles with quality level in GoF in FoV are selected to transmit, otherwise . Here we assume that for GoF , all the tiles in FoV have the same quality level during transmission.

Iv-B Calculations of Stall Duration and Quality Switch Times

Iv-B1 Stall Duration and Buffer Dynamics

For GoF , denote as download time, as decoding time, as playback time. Then can be expressed as:


where is the data size of GoF in user’s FoV with quality level and represents the network bandwidth at time of GoF .

Decoding time can be expressed as:


where is the decoding time of GoF in user’s FoV with quality level .

Then the total download and decoding time of GoF can be defined as:


The playback time of GoF can be defined as:


where is the number of frames of GoF in user’s FoV, and is the frame rate, i.e., how many frame are played per second.

Assuming that the playback device only caches one GoF’s data, then the stall duration can be expressed as:


Iv-B2 Quality Switch

We define quality switch as the absolute value of the difference between quality level of two adjacent GoFs, which can be expressed as:


where is the tiles set inside user’s FoV when GoF is played. represents the distance between tile and the user’s viewpoint, and is the diagonal length of Bounding-Box for GoF where the tile is located. As shown in Equ. 13, the quality switch is multiplied by a weight representing the viewport distance effect, which is the ratio of to . Obviously, the farther the tile is from the viewpoint, the smaller the quality switch will be, that is, the smaller the quality switch affects the user’s QoE, and vice versa.

Iv-C Point cloud video processing

In point cloud video streaming, we need to solve the following multi-objective optimization problem by trying to maximize the quality level while minimizing stall duration and quality switch.


During the experiment setup, we select four volumetric videos of 300 frames [3] longdress, loot, soldier with lower resolution, and Basketball player with higher resolution. Each volumetric video is divided into 30 GoFs, each containing 10 frames and being divided into 12 tiles. We use MPEG PCC TMC2333https://github.com/MPEGGroup/mpeg-pcc-tmc2 software to generate up to 5 different quality levels representations. For longdress and loot, we set four different network bandwidths Mb/s. While for soldier and Basketball player, we set another four bandwidths Mb/s. We also use two user’s FoV prediction schemes, perfect and most recent. Perfect means FoV data can be obtained in real-time from user’s HMD, and most recent is another prediction method with one GoF delay of perfect FoV data.

We select two groups of subjects to use HTC vive HMD to view the original complete point cloud video. During the viewing procedure of first group, we record their FoV data, which is used for viewing of second group.

Thus we can get 32 (4 videos 4 bandwidths 2 prediction schemes) different video configurations, and then solve the optimization problem with a simple brute-force search method. With the optimization variable , we can select the compression level of all tiles in user’s FoV, and then merge them into a new GoF based on the FoV information. Finally all processed point cloud videos can be obtained, and the PCPSNR, stall duration and quality switch of all videos can be calculated. The detailed results are as follows.

Iv-C1 Bandwidth versus PCPSNR and Stall

As shown in Fig. 5, with the increase of network bandwidth, the PCPSNR of four videos increases continuously, which indicates that the objective quality of the videos becomes higher. Fig. 6 shows that the stall duration of four videos decreases with the increase of bandwidth. In the case of most recent FoV analysis, due to that the streaming scheme selects higher quality at a larger bandwidth, the stall time of the point cloud video at 340 Mb/s is slightly increased compared with 230 Mb/s.

((a)) longdress
((b)) loot
((c)) soldier
((d)) Basketball player
Fig. 5: The relationship between network bandwidth and PCPSNR: perfect FoV prediction (blue square) and most recent FoV prediction (brown triangle).
((a)) longdress
((b)) loot
((c)) soldier
((d)) Basketball player
Fig. 6: The relationship between network bandwidth and Stall: perfect FoV prediction (blue square) and most recent FoV prediction (brown triangle).

Iv-C2 Bandwidth and Quality Switch

As shown in Fig. 7, the quality switch of four videos does not monotonously change with the increase of bandwidth, which is larger in the middle two bandwidths than the two sides. This is because we hope to conduct a more comprehensive subjective experiment, that is, each GoF of the videos is assigned the lowest quality by the system when the minimum bandwidth is set, and the highest quality when the maximum bandwidth is set. So there is almost no quality switch on two sides.

((a)) longdress
((b)) loot
((c)) soldier
((d)) Basketball player
Fig. 7: The relationship between bandwidth and Quality Switch: perfect FoV prediction (blue square) and most recent FoV prediction (brown triangle).

Fig. 8 shows the original and generated versions of one single frame in the four point cloud videos. We use HTC vive as the HMD to play the processed point cloud videos through the compiled Unity program, and conduct user’s subjective experiments.

Fig. 8: Illustration of the original and generated point cloud video, from top to bottom: Basketball player, longdress, loot, soldier.

V Subjective experiment

This section describes our subjective experiment in detail.

Experimental apparatus and environment: We use ITU-R BT.500-14444https://www.itu.int/rec/R-REC-BT.500-14-201910-I/en as the reference standards of our subjective experiments. We design a point cloud video player based on Unity, ask subjects wear the HTC vive HMD to enjoy the generated point cloud video, as shown in the Fig. 9.

((a)) Training procedure
((b)) Evaluating procedure
Fig. 9: Subjects are using the HTC vive HMD for training and evaluating point cloud videos.
((a)) Point cloud video player during training
((b)) Point cloud video player during evaluating
Fig. 10: Point cloud video player during training and evaluating.

Participants: We select 34 subjects from our school to participate in the subjective evaluation experiment, including 25 males and 9 females. In order to make the subjects more comprehensive, 6 of them are experts in point cloud video, and the remaining 28 participate in the point cloud video experiment for the first-time.

Training and evaluating: As shown in Fig. 9, the subjects train and evaluate the point cloud videos during the experiment. Fig. 9 shows the scenario of training and evaluating, and Fig. 10 shows the corresponding user interfaces (UI) of point cloud video player. The subjects participating in point cloud video experiment for the first time may not be familiar with point cloud video, so they are given some time to watch them. Specifically, we train the first-time subjects by watching point cloud videos of different contents and qualities, while point cloud video experts are not trained. After a short rest, all subjects evaluate the received point cloud videos using the double-stimulus impairment scale (DSIS)4 method. 32 pairs of processed point cloud videos and corresponding original point cloud videos are played randomly. Subjects first view the reference original point cloud video and then view the received point cloud video.

Scoring: After viewing the videos, the subjects score the processed point cloud videos. The score is set from 1 to 5, divided into five grades: excellent, good, average, poor and exceptionally poor. For each generated point cloud video, we calculate the mean of all user opinion scores. The mean opinion score (MOS) represents the overall user QoE of each processed point cloud video.

Vi Experimental results and analysis

After the subjective experiment, we can analyze the relationship between the subjective score MOS and PCPSNR, stall time and quality switch to obtain the specific QoE model. Besides we make the subjective experiment MOS public555https://github.com/Siwxw/Volumetric-video-streaming-subjective-experiment-database.

Vi-a The relationship between MOS and PCPSNR, stall and quality switch

((a)) longdress
((b)) loot
((c)) soldier
((d)) Basketball player
Fig. 11: The relationship between network bandwidth and MOS: perfect FoV prediction (blue square) and most recent FoV prediction (brown triangle).

Fig. 11 shows the relationship between bandwidth and MOS. The MOS of the four videos all increase with the increase of the bandwidth, as higher bandwidth enables the selection of higher quality point cloud video, and at the same time leads to lower stall time, which results in better user experience. This also reflects that the subjects are more sensitive to the changes of PCPSNR and stall compared with the quality switch, because the large change in the quality switch does not affect the trend of MOS increasing.

The MOS of , , and grow slowly in the middle two bandwidths compared to the two sides. This is because the quality switch is very high in the middle bandwidth setting, resulting in lower user MOS. However, and have a big change in MOS when the bandwidth changes from 70 Mb/s to 120 Mb/s, which is caused by a large reduction in stall duration. From this analysis, we can see that subjects are more concerned about the negative effects of stall than quality switch. The MOS and stall of are relatively correlated, which is because the PCPSNR and quality switch of change slightly with the increase of bandwidth.

From Fig. 11, we can observe that PCPSNR, i.e., objective quality, is positively correlated with MOS, while stall and quality switch are negatively correlated with MOS. This result corresponds to our analysis of the influencing factors of QoE in Section III, proving the validity of our QoE modeling analysis. We also find out that subjects are more concerned about the effects of PCPSNR and stall than quality switch.

Vi-B Function fitting and correlation verification of MOS model

In order to obtain specific QoE metric, we take a function fit to the experimental data to calculate a mathematical model between the MOS and the three influencing factors. The specific fitting procedure is divided into three parts, as follows

Vi-B1 Sample scatter diagram

We have made a sample scatter plot of the dependent variable MOS and their respective variables, PCPSNR, stall and quality switch, as shown in Fig. 12

. It can be seen that these points are roughly distributed beside a straight line, and linear regression can be adopted.

Fig. 12: Scatter plots between the MOS and PCPSNR, stall and quality switch.

Vi-B2 Multiple linear regression

In order to get the final QoE model, we use 60% of the MOS data for training to get the mathematical model and the rest of data is used for validating. During the training process, we used SPSS software for data analysis, and the relevant results are shown in Table I.

Coefficients t p VIF
Constant -0.471 -0.479 0.639 -
PCPSNR 0.037 4.585 0.000** 1.703
Stall -0.313 -6.83 0.000** 1.788
Quality Switch -0.007 -5.667 0.000** 1.331
F F (3,15)=66.948,p=0.000
D-W 1.738
Dependent Variable: MOS

Results of linear regression analysis

Thus the QoE metric can be obtained as follows:


The R-square value of the model is 0.931, indicating that PCPSNR, stall, and quality switch could explain 93.1% of the variation of QoE. It can be noticed that the model passed the test (

), indicating that at least one of PCPSNR, stall and quality switch would have an influence on QoE. In addition, according to the multi-collinearity test of the model, all VIF (variance inflation factor) values in the model are less than

, indicating that there is no collinearity problem. Moreover, the D-W value is near , which indicates that the model does not have auto-correlation and there is no correlation between the sample data used for training.

The final analysis shows that the regression coefficient of PCPSNR is 0.037(), indicating that PCPSNR has a significant positive influence on MOS. The regression coefficient of stall is -0.313(), indicating that stall would have a significant negative influence on MOS. The regression coefficient of quality switch is -0.007(), indicating that quality switch has a significant negative influence on MOS.

Vi-B3 Model validation

In order to further verify the MOS model, we use the remaining 40% of the data to calculate the predicted MOS through the fitted MOS model, and then compare the correlation between the subjective MOS and predicted MOS, as shown in Fig. 13. It can be seen that the Goodness of fit R-square value is 0.924, indicating the fitted MOS model is correct.

Fig. 13: Correlation between subjective MOS and predicted MOS calculated by the fitting MOS model.

In general, the results inferred by the above two statistical methods are consistent, indicating that there is a significant linear correlation between the dependent variable MOS and the independent variable, and can well predict the QoE of users for point cloud video during volumetric video streaming.

Vii Conclusion

We have studied the QoE model of volumetric video streaming, which takes into account three factors closely related to the user’s QoE. These three factors are the objective quality related to the point cloud, stall and quality switch during video streaming. To properly calculate the point cloud video quality, we combine the position and color information of points to expand the objective quality metric of the point cloud, and take the viewport distance effect into account to model an accurate QoE metric. With the detailed double-stimulus subjective experiments, we obtain the final QoE metric through the analysis of subjective MOS data. We find out that objective quality is positively correlated with QoE, while stall and quality switch have negatively correlation with QoE. In addition, subjects are more concerned about the effects of PCPSNR and stall than quality switch. The experiments prove that our QoE model can well predict the subjective quality experience of user in volumetric video streaming, which has a good inspiration for the subsequent related QoE research and the optimization of volumetric video streaming.


  • [1] I. BT (2015) 709-6,“. Parameter values for the HDTV standards for production and international programme exchange,” Jun. Cited by: §III-B2.
  • [2] A. Clemm, M. T. Vega, H. K. Ravuri, T. Wauters, and F. De Turck (2020) Toward truly immersive holographic-type communication: challenges and solutions. IEEE Communications Magazine 58 (1), pp. 93–99. Cited by: §I.
  • [3] d’Eon,Eugene, Harrison,Bob, Myers,Taos, and P. Chou (2017-01) 8i voxelized full bodies-a voxelized point cloud dataset. ISO/IEC JTC1/SC29 Joint WG11/WG1 (MPEG/JPEG) input document WG11M40059/WG1M74006, Geneva. Cited by: §I, §IV-C.
  • [4] R. I. T. da Costa Filho, M. C. Luizelli, M. T. Vega, J. van der Hooft, S. Petrangeli, T. Wauters, F. De Turck, and L. P. Gaspary (2018) Predicting the performance of virtual reality video streaming in mobile networks. pp. 270–283. Cited by: §II-C.
  • [5] E. Dumic and L. A. da Silva Cruz (2020) Point cloud coding solutions, subjective assessment and objective measures: a case study. Symmetry 12 (12), pp. 1955. Cited by: §I.
  • [6] L. Gao, H. Bai, G. Lee, and M. Billinghurst (2016) An oriented point-cloud view for mr remote collaboration. New York, NY, USA. External Links: ISBN 9781450345514 Cited by: §I.
  • [7] B. Han, Y. Liu, and F. Qian (2020) ViVo: visibility-aware mobile volumetric video streaming. In Proceedings of the 26th Annual International Conference on Mobile Computing and NetworkingICC 2020 - 2020 IEEE International Conference on Communications (ICC)2017 IEEE International Conference on Image Processing (ICIP)2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX)2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW)2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX)Proceedings of the 9th ACM Multimedia Systems Conference2018 IEEE Global Communications Conference (GLOBECOM)IEEE INFOCOM 2020 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)SIGGRAPH ASIA 2016 Mobile Graphics and Interactive Applications, MobiCom ’20SA ’16, Vol. , New York, NY, USA. External Links: ISBN 9781450370851, Link, Document Cited by: §II-A.
  • [8] M. Hosseini and C. Timmerer (2018) Dynamic adaptive point cloud streaming. In Proceedings of the 23rd Packet Video Workshop, pp. 25–30. Cited by: §I.
  • [9] A. Javaheri, C. Brites, F. Pereira, and J. Ascenso (2017) Subjective and objective quality evaluation of 3d point cloud denoising algorithms. pp. 1–6. Cited by: §III-B1.
  • [10] J. Li, R. Feng, Z. Liu, W. Sun, and Q. Li (2018) Modeling qoe of virtual reality video transmission over wireless networks. pp. 1–7. Cited by: §II-C, §III-A.
  • [11] J. Li, C. Zhang, Z. Liu, W. Sun, W. Hu, and Q. Li (2020) Demo abstract: narwhal: a dash-based point cloud video streaming system over wireless networks. pp. 1326–1327. External Links: Document Cited by: §I, §III-A.
  • [12] J. Li, C. Zhang, Z. Liu, W. Sun, and Q. Li (2020) Joint communication and computational resource allocation for qoe-driven point cloud video streaming. pp. 1–6. External Links: Document Cited by: §II-A, §III-A.
  • [13] Y. Liu, S. Dey, F. Ulupinar, M. Luby, and Y. Mao (2015) Deriving and validating user experience model for dash video streaming. IEEE Transactions on Broadcasting 61 (4), pp. 651–665. Cited by: §II-C.
  • [14] Z. Liu, Q. Li, X. Chen, C. Wu, J. Li, Y. Ji, et al. (2021) Point cloud video streaming: challenges and solutions. IEEE Network. Cited by: §I.
  • [15] R. Mekuria, Z. Li, C. Tulvan, and P. Chou (2016) Evaluation criteria for pcc (point cloud compression). output document n 16332, iso. Technical report IEC JTC1/SC29/WG11 MPEG. Cited by: §II-B.
  • [16] R. Mekuria, K. Blom, and P. Cesar (2016) Design, implementation, and evaluation of a point cloud codec for tele-immersive video. IEEE Transactions on Circuits and Systems for Video Technology 27 (4), pp. 828–842. Cited by: §II-A.
  • [17] J. Ohm, G. J. Sullivan, H. Schwarz, T. K. Tan, and T. Wiegand (2012) Comparison of the coding efficiency of video coding standards—including high efficiency video coding (hevc). IEEE Transactions on circuits and systems for video technology 22 (12), pp. 1669–1684. Cited by: §III-B2.
  • [18] R. Pagés, K. Amplianitis, D. Monaghan, J. Ondřej, and A. Smolić (2018) Affordable content creation for free-viewpoint video and vr/ar applications. Journal of Visual Communication and Image Representation 53, pp. 192–201. Cited by: §I.
  • [19] J. Park, P. A. Chou, and J. Hwang (2018) Volumetric media streaming for augmented reality. In 2018 IEEE Global Communications Conference (GLOBECOM), pp. 1–6. Cited by: §I.
  • [20] J. Park, P. A. Chou, and J. Hwang (2019) Rate-utility optimized streaming of volumetric media for augmented reality. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 9 (1), pp. 149–162. External Links: Document Cited by: §II-A.
  • [21] S. Schwarz, M. Preda, V. Baroncini, M. Budagavi, P. Cesar, P. A. Chou, R. A. Cohen, M. Krivokuća, S. Lasserre, Z. Li, et al. (2018) Emerging mpeg standards for point cloud compression. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 9 (1), pp. 133–148. Cited by: §I, §II-A.
  • [22] D. Tian, H. Ochimizu, C. Feng, R. Cohen, and A. Vetro (2017) Geometric distortion metrics for point cloud compression. pp. 3460–3464. Cited by: §II-B, §III-B1.
  • [23] J. van der Hooft, M. T. Vega, C. Timmerer, A. C. Begen, F. De Turck, and R. Schatz (2020) Objective and subjective qoe evaluation for adaptive point cloud streaming. pp. 1–6. Cited by: §II-B.
  • [24] J. van der Hooft, T. Wauters, F. De Turck, C. Timmerer, and H. Hellwagner (2019) Towards 6dof http adaptive streaming through point cloud compression. In Proceedings of the 27th ACM International Conference on Multimedia, pp. 2405–2413. Cited by: §I.
  • [25] I. Viola, S. Subramanyam, and P. Cesar (2020) A color-based objective quality metric for point cloud contents. pp. 1–6. Cited by: §II-B.
  • [26] L. Wang, C. Li, W. Dai, J. Zou, and H. Xiong (2021) QoE-driven and tile-based adaptive streaming for point clouds. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1930–1934. Cited by: §I.
  • [27] S. Winkler and P. Mohandas (2008) The evolution of video quality measurement: from psnr to hybrid metrics. IEEE transactions on Broadcasting 54 (3), pp. 660–668. Cited by: §I.
  • [28] E. Zerman, P. Gao, C. Ozcinar, and A. Smolic (2019) Subjective and objective quality assessment for volumetric video compression. Electronic Imaging 2019 (10), pp. 323–1. Cited by: §II-B.