Virtual reality (VR) is often seen as one of the most important applications in 5G cellular systems [ABIQualcommVR:17, EjderVR:17, ParkWCL:18]. As in real life, mobile VR users can interact in the virtual space immersively with virtual objects that may stimulate their multiple sensory organs. This multimodal VR perception happens, for example, when a VR user measures the size of a virtual object through visual and haptic senses. In this study, we consider the problem of supporting such visuo-haptic VR perceptions over wireless cellular networks, and focus on the downlink design.
The key challenge is that these two perceptions have completely different cellular service requirements. In fact, visual traffic requires high data rate and relatively low reliability with packet error rate (PER) on the order of [Shi:10, UR2Cspaswin:17]. This requirements can be supported mostly through enhanced mobile broadband (eMBB) links [ITU5G:15]. Haptic traffic, by contrast, should guarantee a fixed target rate and high reliability with PER on the order of [Steinbach:12, Zhang:18], which can be satisfied via ultra-reliable and low latency communication (URLLC) links [PetarURLLC:17, MehdiURLLC:18].
Furthermore, in order to render a smooth multimodal experience, the PERs associated with the visuo-haptic VR perceptions should guarantee a target perceptual resolution. To be precise, the perceptual resolution is commonly measured by using the just-noticeable difference (JND) in psychophysics, a field of study that focuses on the quantitative relation between physical stimulus and perception [Ernst:2002aa, Shi:10, ShiHirche:16]. Following Weber’s law, JND describes the minimum detectable change amount of perceptual inputs, e.g., mm for the object size measurement using visuo-haptic perceptions [Ernst:2002aa]
. According to psychophysical experiments, the JND of the aggregate visuo-haptic perception is the harmonic mean of the squared JNDs of the individual perceptions[Ernst:2002aa], in which the JND of each perception is proportional to the PER [Shi:10].
As a result, the PERs associated with the visuo-haptic VR traffic should be adjusted so as to achieve a target JND, while abiding by the eMBB and URLLC service objectives in terms of PERs and data rates. Due to the discrepancy of the visual and haptic service requirements, it is difficult to support both perceptions through either eMBB or URLLC links. Hence, it is necessary to slice the visuo-haptic VR traffic into eMBB and URLLC links, leading to URLLC-eMBB multimodal transmissions. Unfortunately, such multimodal transmissions bring about multimodal self-interference, which is manifested through an actual wireless interference as visualized in Figs. 1-a and b, or via the necessity to share resources as shown in Figs. 1-b and c.
This critical self-interference can be alleviated by multiplexing the URLLC-eMBB multimodal transmissions over the transmit power domain with successive interference cancellation (SIC) at reception, i.e., downlink non-orthogonal multiple access (NOMA) [3GPPMUST:2015], as illustrated in Fig. 1-b. Alternatively, as Fig. 1-c shows, the self-interference can be avoided via orthogonal multiple access (OMA) such as frequency division multiple access (FDMA). In this paper, using stochastic geometry, we investigate the optimal design of NOMA and OMA to support visuo-haptic VR perceptions while coping with the multimodal self-interference in a large-scale downlink system.
Related Works – The communication and computation resource management of mobile VR networks has recently been investigated in [EjderVR:17, OsvaldoAR:17, ChenSaad:17, ParkWCL:18, Elbamby:18], particularly under a VR social network application [ParkWCL:18] and a VR gaming scenario [Elbamby:18]. The end-to-end latency has been studied in [OsvaldoAR:17] for a single-cell scenario and in [ChenSaad:17, ParkWCL:18] for a multi-cell scenario. These works focus primarily on supporting either visual or haptic perceptions. Towards supporting multimodal perceptions, suitable network architecture and coding design have been proposed in [Zhang:18, Steinbach:12], while not specifying the requirements on the wireless links. In an uplink single-cell system, orthogonal/non-orthogonal multiplexing of URLLC and eMBB links has been optimized by exploiting their reliability diversity in [Petar5G:18].
Contributions – The main contributions of this work are summarized as follows.
To the best of our knowledge, this is the first work that combines both visual and haptic modalities in the context of mobile VR network design.
To support visuo-haptic VR perceptions, an optimal downlink NOMA design with reliability-ordered SIC has been proposed (see Lemma 2 and Proposition 4).
Compared to an OMA baseline (see Proposition 2), it has been observed that the proposed NOMA becomes preferable under a higher target integrated-perceptual resolution and/or a higher target rate for haptic perceptions (see Fig. 3).
By using stochastic geometry, closed-form average rate expressions have been derived for downlink URLLC-eMBB multiplexing under OMA and NOMA in a large-scale cellular network (see Propositions 1 and 3).
Ii System Model and Problem Formulation
In this section, we first introduce the downlink system operation of OMA and NOMA under a single-cell scenario, and describe its extension to the operation under a large-scale network. Then, we specify visuo-haptic perceptions, followed by the problem formulation of visuo-haptic VR traffic slicing and multiplexing.
The user under study requests visuo-haptic VR perceptions that are supported through URLLC-eMBB cellular links. We use the subscript to indicate the URLLC link with and the eMBB link with . The subscript identifies OMA and NOMA, respectively.
Ii-a Single-Cell Channel Model with OMA and NOMA
In a downlink scenario, we consider a single user that is associated with a single base station (BS). For both OMA and NOMA, the transmissions of the BS at a given time occupy up to the frequency bandwidth normalized to one, which is divided into the number of miniblocks. Each miniblock is assumed to be within the frequency-time channel coherence intervals. The channel coefficients are thus constants within each miniblock, and fade independently across different miniblocks over frequency and time. The transmit power of the BS is equally divided for each miniblock, normalized to one.
Ii-A1 Single-Cell OMA
A set of miniblocks are allocated to , with . Each set corresponds to a fraction , with . The transmit power allocations to and are set as the maximum transmit power per miniblock. Denoting as the transmit power allocation fraction to miniblock , this corresponds to the allocations that equal .
The user’s received signal-to-noise ratio () is determined by small-scale and large-scale fading gains. For a given user-BS association distance , the large-scale fading gains of and are identically given as with the path loss exponent . For miniblock , the small-scale fading gain
is an exponential random variable with unit mean, which is independent and identically distributed (i.i.d.) across different miniblocks. The user’s receivedof through miniblock is then expressed as
where is the noise spectral density of a single miniblock.
Ii-A2 Single-Cell NOMA
The entire bandwidth is utilized for both links in NOMA, i.e., the miniblock allocation fractions equal . This is enabled by transmitting the superposition of the signals intended for and , with their different transmit power allocations, and then by decoding the signals with SIC at reception [3GPPMUST:2015]. The transmit power allocated to has a fraction of the maximum transmit power per miniblock, with .
At reception, unless otherwise noted, we consider is decoded prior to . This SIC order implicitly captures the low-latency guarantee of , as addressed in [Petar5G:18] for an uplink scenario. Furthermore, it improves the overall NOMA system performance due to the reliability diversity of and , to be elaborated in Sect. LABEL:Sect:OptNOMA.
With the said SIC order, the signal intended for is first decoded, while treating the signal for as noise, i.e., multimodal self-interference. The decoded signal is then removed by applying SIC, and the remaining signal for is finally decoded without self-interference. The user’s received for through miniblock is thereby obtained as
Note that all the fading gains of and are identically since their channels are identical.
Ii-B Channel Model under a Stochastic Geometric Network
By using stochastic geometry, the aforementioned single-cell operation of OMA and NOMA is extended to a large-scale multi-cell scenario as follows. The BSs under study are deployed in a two-dimensional Euclidean plane, according to a stationary Poisson point process (PPP) with density , where the coordinates of a BS belongs to . Following the single-cell operation, each BS serves a single user through its and .
The locations of users follow an arbitrary stationary point process. Each user associates with the nearest BS, and downloads the visuo-haptic VR traffic through the and of the BS. Following [Andrews:2011bg], we focus our analysis on a typical user that is located at the origin and associated with the nearest BS located at position of the plane. This typical user captures the spatially-averaged performance, thanks to Slyvnyak’s theorem [HaenggiSG] and the stationarity of .
In the previous single-cell scenario, interference occurs only from the multimodal self-interference under NOMA, as shown in (2). In addition to such intra-cell self-interference, extension to the stochastic geometric network model induces inter-cell interference. As done in [Andrews:2011bg, JHParkTWC:15, UR2Cspaswin:17], inter-cell interference is treated as noise, and is assumed to be large such that the maximum noise power is negligible. In this interference-limited regime, channel quality is measured not by but by signal-to-interference ratio (), as described next.
The inter-cell interference is measured by the typical user, and comes from the set of the BSs that are not associated with the typical user. We consider every BS always utilizes the entire bandwidth and the maximum transmit power. The average inter-cell interference per miniblock is thus identically given under both OMA and NOMA. The instantaneous inter-cell interference varies due to small-scale fading. For each miniblock, any interfering link’s small-scale fading is independent of the small-scale fading of the typical user’s desired and .
Under OMA, the typical user’s received of through miniblock is thereby given as
where ’s are exponential random variables with unit mean, which are independent of and are i.i.d. across different interfering BSs. Likewise, under NOMA, the typical user’s received of through miniblock is expressed as
It is noted that all the s under NOMA and OMA are identically distributed across different miniblocks. For the typical user’s and , the large-scale fading gains are identical. Their small-scale fading gains are independent under OMA, but are fully-correlated under NOMA.
Ii-C Average Rate with Decoding Success Guarantee
In a large-scale downlink cellular system with OMA and NOMA, we derive the typical user’s average rate that guarantees a target decoding success probability. Decoding becomes successful when the instantaneous downlink rate exceeds the transmitted coding rate.
To facilitate tractable analysis, we consider that the instantaneous channel information is not available at each BS. With the channel information at a BS, one can improve the average rate by adjusting the transmit power [Petar5G:18] and/or the coding rate [Andrews:2011bg, JHParkTWC:15]. In addition, we assume separate coding for each miniblock, which may loose frequency diversity gain compared to the coding across multiple miniblocks [Petar5G:18, TseBook:FundamaentalsWC:2005].
With these assumptions and the s that are identically distributed across miniblocks, average rate is determined by the decoding success probability for any single miniblock. Therefore, we drop the superscript in and the small-scale fading terms, and derive the average rate in the sequel.
The typical user can decode the signal from with the decoding success probability that equals
where is the coding rate per miniblock, which is hereafter rephrased as a target threshold .
For the given target decoding success probability of , the average rate of is obtained by using outage capacity [TseOC:07] as
where the optimal target threshold satisfies , and thus equals .
Note that even when the coding block length of is short, the average rate expression in (10) still holds, since the finite-block length rate under fading channels converges to the outage capacity [DurisiPolyanski:14].
With the SIC order that decodes prior to , the typical user’s decoding success probabilities and of and are given as
Following [JindalSIC:09], our SIC do not allow to decode the signal after the decoding failure of the signal. With a different SIC architecture that allows such a decoding attempt, (12) is regarded as the lower bound, as done in [Petar5G:18].
For the given target decoding success probability of , the average rate of is given as
where the optimal target threshold equals . Similarly, for the given target decoding success probability of , the average rate of is given as
Ii-D Visuo-Haptic Perceptual Resolution
The resolution of human perceptions is often measured by using JND in psychophysics. In a psychophysical experiment, the JND is calculated as the minimum stimulus variation that can be detectable during % of the trials [Ernst:2002aa]. For a visuo-haptic perception, its integrated JND is obtained by combining the JNDs of visual and haptic perceptions.
To elaborate, when individual haptic and visual perceptions have the perceived noise variancesand , a human brain combines these perceptions, yielding an integrated noise variance that satisfies . This relationship was first discovered in [Ernst:2002aa] by measuring the corresponding JNDs that are proportional to the perceived noise variances. The said relationship is thus read as , where denotes the JND measured when using both visuo-hapric perceptions, while and identify the JNDs of the individual haptic and visual perceptions, respectively.
For individual visual perceptions, it has been reported by another experiment [Shi:10] that the PER is proportional to its sole JND due to the resulting visual frame loss. Similarly, for individual haptic perceptions, it has been observed in [ShiHirche:16] that the PER is proportional to the elapsed time to complete a given experimental task, which increases with the corresponding JND due to the coarse perceptions. Based on such experimental evidence, we can write that , where represents the PER on .
Accordingly, the JND of visuo-haptic perceptions is obtained from the following equation
In the following subsection, we adjust the target decoding success probabilities and of and , so as to guarantee a target visuo-haptic JND , i.e., .
Ii-E URLLC-eMBB Multiplexing Problem Formulation
In a downlink cellular system serving visuo-haptic VR traffic, haptic and visual perceptions are supported through and , respectively. Each link pursues different service objectives as follows. The URLLC aims at:
Ensuring a target decoding success probability ; and
Ensuring a target average rate .
In contrast, the eMBB aims at:
Maximizing the average rate ; while
Ensuring a target decoding success probability , with .
In (iv), follows from an experimental evidence that the quality of visual perceptions dramatically drops when PER exceeds a certain limit, e.g., % PER that equals [Shi:10].
In addition to these individual service objectives, with and , their aggregate JND should guarantee a target visuo-haptic JND . The said service objectives and requirements of and are described in the following problem formulation.
The objective functions and in the constraint (18b) are obtained from (10) for OMA and from (14) for NOMA. In the constraint (18c), is provided in (17). Without loss of generality, we hereafter consider a sufficiently large number of miniblocks so that the miniblock allocation fraction under OMA is treated as a continuous value.
Iii Optimal Multiplexing of Visuo-Haptic VR Traffic under OMA and NOMA
In this section, we optimize the multiplexing of and that support visuo-haptic VR traffic. With P1, for OMA, we optimize the miniblock allocation from the unit frequency block to each link. For NOMA, on the other hand, we optimize the power allocation from the unit transmit power.
Iii-a Optimal OMA
We aim at optimizing the miniblock allocation . To this end, for given and , we derive the average rate with . This requires taking the inverse function of in (8).
The typical user’s is commonly referred to as coverage probability, and its closed-form expression can be derived by using stochastic geometry [Andrews:2011bg, Haenggi:ISIT14]. Namely, is given as