Along with the emergence and popularity of one virtual reality (VR) product after another, such as Oculus Rift, Gear VR, and HTC Vive, video contents are becoming one of the most important applications for the VR product. To support the content representation from all directions and create a fully immersed experience, the VR video needs to contain the information from all 360 degrees. Therefore, the VR video, also named as 360-degree video, should be with very high spatial resolution even higher than to maintain relatively good visual quality. Such high resolution videos can bring many challenges to the video compression technologies, and the need to develop specified compression method for these video becomes quite urgent.
Since the original 360-degree video is a sphere, to adapt to the modern video coding standards such as H.264/Advanced Video Coding (AVC) , and H.265/High Efficiency Video Coding (HEVC) , the 360-degree video is always projected to a 2-D format for compression. According to the investigation in , there are actually lots of projection methods such as equirectangular and polyhedron including cube map, octahedron, icosahedron. Comparing the equirectangular and polyhedron formats, the polyhedron formats present less geometry distortion so that they can lead to better coding efficiency  . However, the polyhedron formats also have their disadvantages that very obvious texture discontinuities exist in the area near the face boundary. The texture discontinuities can be divided into two kinds, which are obviously shown in Fig. 1 for the typical
cubic format. One kind of the discontinuities is caused by the face unfold from 3-D cubic to 2-D image, which is represented by the green rectangles. The other kind of discontinuities is brought by the projection to different planes (or faces) from sphere to cubic format, which is shown by the red rectangles. When the motion vector (MV) happens to cross the face boundary, the current motion compensation (MC) scheme will obtain an unreasonable prediction block with quite obvious texture discontinuity, which will lead to serious coding efficiency decrease.
In the current standard-based video coding scheme, a simple padding scheme, which extends the picture boundary pixel to the outside of the picture, is implemented in the HEVC reference software  to both guarantee the picture size as the multiple of the coding unit size and prevent the MC operation from crossing the picture boundary. Li et al.  have also tried to optimize the padding scheme for arbitrary size picture using the fundamental rate distortion optimization (RDO) theory. However, since these schemes only consider the picture itself and have not considered the specific 360-degree information of the 360-degree video, they are not the best ways to solve the problems of texture discontinuity in the face boundary for the 360-degree video.
Therefore, in this paper, to better solve the problem of texture discontinuity in the face boundary, we try to make full use of all the information from the 360-degree video. To be more specific, we first fill the neighboring faces in the suitable positions for the current face to keep approximate texture continuity. Then we propose a co-projection-plane based 3-D padding method to project the reference pixels in the neighboring face to the current face to guarantee exact texture continuity. Under the proposed scheme, the reference pixel is always projected to the same plane with the current pixel when performing MC so that the texture discontinuity problem in the face boundary can be solved.
This paper is organized as follows. In Section 2, we will give a brief introduction of the polyhedron projection. The proposed co-projection-plain based 3-D mapping method will be described in detail in Section 3. The detailed experimental results will be shown in Section 4. Section 5 concludes the whole paper.
2 A brief introduction of the polyhedron projection
As its name implies, polyhedron projection is to project the inscribed sphere (360-degree video) to each face of the polyhedron, such as cube, octahedron, and icosahedron. As a typical example, the detailed projection process from inscribed sphere to the cube map can be seen from Fig. 2. For each point in the face of the cube, we will connect a line between the center point and . Then the line and the sphere will have an intersection point , and the pixel value of point will be used as the value of point . Since the point may not be in the integer sampling position of the sphere, the pixel value of point
will be interpolated through surrounding integer pixels. To be more specific, the Luma component is interpolated using the Lanczos3 ()  interpolation filter, and the Chroma component is interpolated using the Lanczos2 ()  interpolation filter.
After the projection from a sphere to a polyhedron, the polyhedron will then be unfolded to obtain the 2-D image for compression. There are various kinds of unfolding methods for a polyhedron including non-compact and compact methods. Especially, for the cube map projection, as shown in Fig. 3, mainly two methods of unfolding by putting different faces in different positions are introduced, including and formates. And in the following sections, the cube map projection will be used as an example to introduce the proposed co-projection-plain based 3-D mapping methods.
3 The proposed co-projection-plain based 3-D padding
The proposed co-projection-plain based 3-D padding method will be introduced in two aspects. We will first fill the corresponding neighboring faces in the suitable positions as the extension of the current face to keep approximated texture continuity in subsection 3.1. Then we will project the reference pixels in the neighboring face to the current face to guarantee exact texture continuity in subsection 3.2. Finally, in subsection 3.3, we will introduce some implementation details.
3.1 Approximated texture continuity
As each face of a cube has four edges, to achieve approximated texture continuity, we should first try to make all the four neighboring faces of the current face available. As shown in Fig. 3 (a), the front face has three neighboring faces, the right and rear faces have two neighboring faces, and the top, bottom, and left faces have only one neighboring face. We will complement the neighboring faces of all the faces to four neighboring faces. Using the right face as an example, besides the existing front and rear faces, we will complement the top and bottom faces for the current face. The complementation result is shown in Fig. 4 (a), and the actual result of a typical sequence is presented in Fig. 4 (b).
As can be obviously seen from Fig. 4 (b), the complementation result still presents very obvious texture discontinuity in the common edges between the center face and top/bottom faces. The main reason is that the common edges of the neighboring faces are not aligned together. To guarantee the alignment of the common edges, the top face should be rotated by 90 degrees clockwise, and the bottom face should be rotated by 90 degrees anticlockwise. The final approximated texture continuity results are shown in Fig. 5. The above process is just a typical example for the right face, and the other faces can be done in a similar way to achieve approximated texture continuity.
3.2 Exact texture continuity
After the approximated texture continuity is achieved, if we take a look at Fig. 5 (b) carefully, we can still see that straight lines on the car become broken lines when crossing the face boundary. This is mainly caused by the cube map projection from inscribed sphere to difference faces. Therefore, in this subsection, we will propose a co-projection-plain based 3-D padding to achieve exact texture continuity.
As shown in Fig. 6, under the co-projection-plain based 3-D padding method, we will try to extend the current face into a larger one , and the values of the extended pixels will be determined by the projection of the neighboring faces, which are generated in the approximated-texture-continuity step, to the current face. Using the bottom face as an example, for a point in the extended zone of the bottom face, assume that the top left position is , the position in the extension face is , the face extension range is , and the edge length of the cube is . Then the lengths of and can be calculated as
Therefore, according to the principle of similar triangles, we can obtain the length of as
Similarly, we can also obtain the length of as
In this way, the coordinate of the corresponding position in the right face can be derived. The other projection positions of the neighboring faces can be derived in a similar way.
It should be noted that the calculated coordinate may not be always in the integer position. In the current implementation, the bilinear interpolation is used to interpolate the pixels in the fractional positions. It should also be mentioned that the pixels belonging to lines , , , and will be projected to the common edges of two neighboring faces. If the bilinear interpolation is still used, the final pixel values will be interpolated from the neighboring pixels coming from two different faces, which is obviously unreasonable. In our implementation, the pixels belonging to lines , , , and are derived through the average of the neighboring pixels in the extended zones. After these operations, the interpolation results are shown in Fig. 7 (b). Compared with the results generated by the HEVC reference software as shown in Fig. 7 (a), it can be obviously seen that the proposed algorithm can achieve exact texture continuity. Not only the gray zones but also the discontinuous face boundaries are filled with suitable values to guarantee exact texture continuity.
3.3 Implementation details
The proposed algorithm is implemented in the HEVC reference software. Our current implementation can be roughly divided into two parts and will not lead to any modification of the coding tools in the coding unit (CU) level. The first part is to get the extension for all the faces for the reference frames. To be more specific, after the encoding of the current frame is finished, if the current frame is a reference frame, the neighbor faces of all the faces will be first complemented using the method introduced in subsection 3.1 to generate the image similar to Fig. 5 (b). Then the method introduced in subsection 3.2 will be used to generate the extended faces similar to Fig. 7 (b) to achieve exact texture continuity.
Then the second part is to fill the reference frame with the face extension when encoding each CU. For example, when we are encoding a CU in the right face, we will fill in the right face extension to the each reference frame for the current CU. The results can be seen from Fig. 8. It seems discontinuous for the whole frame but for the right face in a predefined search range , the texture is continuous. And after the coding of CUs belonging to the current face, the reference frame will be refilled with the original values and prepare to be filled with the extension of other faces in the future encoding process. It should be noted that in the decoding process before the reference frame will be used for each CU, we will already know the MV of the current CU. Therefore, we can determine whether the current CU needs to fill in the extension of a current face or not according to the value of MV so as to avoiding the unnecessary extension operations and reducing decoding complexity.
4 Experimental results
The proposed co-projection-plain based 3-D padding method is implemented in the HEVC reference software HM-16.6 to compare with HEVC without the proposed algorithm. All the test conditions specified for inter frames including random access (RA) main , low delay (LD) main , low delay P (LDP) main are used as the test conditions. The quantization parameters (QP) tested in our experiments are 22, 27, 32, 37 following the HEVC common test conditions. The face extension range is set as in our experiments. Besides, the BD-rate (Bjontegaard Delta rate)  is used to measure the difference between the anchor and the proposed algorithm. In the current implementation, the Peak Signal to Noise Ratio (PSNR) is used to measure the quality of between the reconstructed and original sequences. We will use the quality metrics, which are more suitable for 360-degree videos such as WS-PSNR  and S-PSNR , as the quality measurements in our future work.
For the test sequences, we use the test sequences specified in  to measure the performance of the proposed algorithm. To be more specific, we used the conversion tool specified in  to convert the high fidelity input test sequences in equirectangular format to the bit cubic formate test sequences. The detailed information and characteristics of the test sequences can be seen in Fig. 1. The frame count tested is approximated as second as shown in Fig. 1.
|Sequence name||Resolution||frame count|
The test results of the proposed algorithm in RA main10, LD main10, and LDP main10 are shown in Table 2, Table 3, and Table 4, respectively. From the test results, we can see that about for the Y component, compared with the HEVC anchor, about averagely , and R-D performance improvement can be achieved in RA, LD, and LDP cases, respectively. For U and V components, about averagely , , and bitrate reduction are observed accordingly. Besides, we can also see from these tables that for the sequence with relatively larger motion, the maximum bitrate saving for the Y component can be as high as , , and in RA, LD, and LDP cases, respectively.
Except for the average and maximum bitrate reduction, we can also see that the proposed algorithm can lead to consistently better R-D performance for all the test sequences even if the RDO based selection between the proposed reference frame and the original reference frame is not used in the proposed framework. This can obviously demonstrate that the reference frame in the proposed framework can always lead to better or equivalent compression results compared with that in the original framework. However, we can also see that the performance improvement may vary due to the differences of the characteristics of various sequences. For the sequences with large motion in the face boundary such as the sequence DrivingInCountry, the situation where the MC cross the face boundary will be quite a lot, thus the proposed algorithm can lead to significant bitrate reduction. On the contrary, for the sequences with almost zero motion in the face boundary such as the sequence Harbor, the situation where the MC cross the face boundary will be very rare, thus the proposed algorithm cannot provide an obvious performance improvement.
Some typical R-D curves in various test conditions with different test sequences are shown in Fig. 9. The R-D curves also demonstrate that the proposed algorithm can lead to some performance improvement compared with HEVC anchor. Besides, from these typical R-D curves, we can also see that the proposed algorithm can lead to similar performance improvement for both high bitrate and low bitrate.
In this paper, we first point out the existence and influences of the very serious texture discontinuities in the face boundary in the polyhedron projection. Then we propose to fill the corresponding neighboring faces in the suitable positions as the extension of the current face to keep approximated texture continuity. After that, a co-projection-plane based 3-D padding method is proposed to project the reference pixels in the neighboring face to the current face to guarantee exact texture continuity. The proposed scheme is implemented in the reference software of High Efficiency Video Coding. Compared with the existing method in the High Efficiency Video Coding reference software, the proposed algorithm can bring averagely and maximum bitrate savings in different test conditions. The experimental results obviously demonstrate that the texture discontinuity in the face boundary can be well handled by the proposed algorithm.
-  T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, “Overview of the H.264/AVC Video Coding Standard,” IEEE Trans. Cir. and Sys. for Video Technol., vol. 13, no. 7, pp. 560–576, July 2003.
-  G. J. Sullivan, J. R. Ohm, W. J. Han, and T. Wiegand, “Overview of the high efficiency video coding (hevc) standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp. 1649–1668, Dec 2012.
-  Yuwen He, Bharath Vishwanath, Xiaoyu Xiu, and Yan Ye, “AHG8: InterDigital’s projection format conversion tool,” Document JVET-D0021, Chengdu, CN, Oct. 2016.
-  Minhua Zhou, “AHG8: A study on compression efficiency of cube projection,” Document JVET-D0022, Chengdu, CN, Oct. 2016.
-  Minhua Zhou, “AHG8: A study on compression efficiency of icosahedral projection,” Document JVET-D0023, Chengdu, CN, Oct. 2016.
-  “High Efficiency Video Coding test model, HM-16.6,” https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/tags/, Accessed: 2016.
-  M. Li, Y. Chang, F. Yang, and S. Wan, “Rate-distortion criterion based picture padding for arbitrary resolution video coding using h.264/mpeg-4 avc,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 20, no. 9, pp. 1233–1241, Sept 2010.
-  “Lanczos resampling, Lanczos interpolation,” https://en.wikipedia.org/wiki/Lanczos_resampling, Accessed: 2016.
-  Gisle Bjontegaard, “Calculation of average PSNR differences between RD-curves,” Document VCEG-M33, Austin, Texas, USA, April 2001.
-  Y. Sun, A. Lu, and L. Yu, “AHG8: WS-PSNR for 360 video objective quality evaluation,” Document JVET-D0040, Chengdu, CN, Oct. 2016.
-  M. Yu, H. Lakshman, and B. Girod, “A framework to evaluate omnidirectional video coding schemes,” in 2015 IEEE International Symposium on Mixed and Augmented Reality, Sept 2015, pp. 31–36.
-  Jill Boyce, Elena Alshina, Adeel Abbas, and Yan Ye, “JVET common test conditions and evaluation procedures for 360-degree video,” Document JVET-D1030, Chengdu, CN, Oct. 2016.