Distributed Relay Selection in Presence of Dynamic Obstacles in Millimeter Wave D2D Communication

10/31/2019 ∙ by Durgesh Singh, et al. ∙ 0

Millimeter wave (mmWave) device to device (D2D) communication is highly susceptible to obstacles due to severe penetration losses and requires almost a line of sight (LOS) communication path. D2D channel condition is local to devices/user equipments (UEs) and hence is not directly visible to the base station (BS). Thus quality of the D2D channel needs to be propagated to BS by UEs which may incur some delay. Hence the solution provided by BS to UEs using this gathered channel information might become less useful to establish communication due to moving obstacles. These types of obstacles might not be known in advance and hence may cause unpredictable fluctuations to the D2D channel quality. Hence we seek to learn the D2D channels using the finite horizon partially observable Markov decision process (POMDP) framework to model the uncertainty in such kind of network environments with dynamic obstacles. The objective is to minimize delay when channel quality deteriorates, by making UEs choose locally the best possible decision between i) to continue on the current relay link on which communication is taking place or ii) to switch to another good relay by exploring other possible UEs in its locality. We derive an optimal threshold policy which tells the UE to take appropriate decision locally. Later, we give a simplified and easy to implement stationary threshold policy which counts the number of successive acknowledgement failures, based on which UE make appropriate decision locally. Through extensive simulation, we demonstrate that our approach outperforms recent algorithms.



There are no comments yet.


page 1

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Device to device (D2D) communication in 5G may bypass the base station (BS) to make devices or user equipments (UE) directly communicate with one another. It helps in reducing outage and reuse resources and to meet the increasing bandwidth requirements of devices. Generally D2D communication is studied for short distance communication which makes millimeter wave (mmWave) as the suitable candidate for it [1]. Although mmWave has very high available bandwidth, but it suffers from very high propagation losses, which may be compensated using directional beams in multi-input multi-output (MIMO) antennas. However, the penetration loss is also very severe for mmWaves for most of the outdoor materials [2, 3]. Hence, it renders mmWave unsuitable in presence of such obstacles which may completely block the mmWave signal. Selecting relays to avoid obstacles have been studied in various works [4, 5, 6, 7, 8]. Most of these work carry out analysis on static obstacles, the problem of choosing relays becomes more challenging where the obstacles are also moving.

The D2D channel condition might deteriorate rapidly due to obstacles and especially due to moving obstacles. This in turn causes link breakage and hence packet loss and delay. The BS cannot sense the quality of D2D channel directly and thus such information needs to be communicated by UEs to the BS. Using this gathered information, the BS may suggest source UE to continue communication via another relay. However this might incur some delays and by the time UEs get global solution provided by the BS, it may become less useful for UEs to communicate due to possible blockage by some dynamic obstacle. There can be other parameters local to a UE (like battery, channel availability, perceived throughput etc.) which may further creates problem in implementing the global solution [9]. For mmWave communication, capturing dynamic obstacles is challenging task whose information may not be available apriori to the local nodes/UEs. Radars can be used to sense the obstacles [10, 11], but it may be too expensive to place radars for detecting the moving obstacles. To deal with the uncertainty caused by the dynamic obstacles, a learning based approached using partially observed Markov decision process (POMDP) [12, 13, 14] is an appropriate choice. We may use past information of D2D channel quality to learn about it. In fact the dynamic obstacle’s presence is also captured indirectly while learning the channel quality.

In this paper, we are modeling our problem of relay selection at each UE locally as a finite horizon POMDP to capture the uncertainty caused in a D2D channel due to moving obstacles. The state of D2D channel is not observable at the current time instant. It can only be observed after taking the decision to transmit packets to a chosen link in form of acknowledgements (ACKs). Information about dynamic obstacles are not known at BS a priori and it can only be learned using the feedback from UEs after communication has been established. Even the ACKs can get lost due to presence of dynamic obstacles. A given UE transmitting the data packets may initially choose the relay suggested by the BS. However at later time instants, the channel quality of the suggested link might deteriorate and may cause huge packet loss and delay. We use conditional probability of D2D channel quality given the ACKs history as the sufficient statistics which is also called the belief probability of the given link. We then derive an optimal policy which maps the belief to a set of actions. An action chosen can be either to continue on the current link or to stop and explore other possibly available relay links. Later, by exploiting the derived policy structure, we obtain a stationary policy which tells the UE whether to continue transmitting along the chosen relay link in case of several successive ACK failure. This helps UE to stop sending the packets on the current link (after some successive ACKs failure) to avoid packet loss and mitigate delay. This method is compared with other state of art solutions i) based on a recent work which selects relays based on maximum throughput

[7] and ii) received signal strength (RSS) based approach. We show in simulation that our proposed method outperforms other approaches.

Our contributions in this paper are summarized as follows:

  1. We consider the effects of dynamic obstacles on D2D mmWave links, which is a new and challenging topic.

  2. We formulate the problem of relay selection as a POMDP, and show that the optimal policy checks whether a certain belief probability exceeds a threshold. This is a non-trivial result that required proof of several interesting intermediate results.

  3. Our optimal policy can be implemented locally at each node, thereby facilitating distributed implementation.

  4. The threshold policy is further reduced to counting the number of successive ACK failures, which is simple and easy to implement.

The rest of the paper is organized as follows. System model is described in section II. The POMDP formulation is provided in section III. Optimal policy structure is derived in section IV. Numerical results are provided in section V, followed by the conclusions in section VI. All proofs are provided in the appendix.

Ii System Model

We are considering the device-tier of 5G D2D architecture mentioned in [15], where devices can communicate among themselves with or without the help from BS. The service region is divided into various zones or grids as shown in figure 1 with one BS. Each zone may have many UEs and is assumed to have atleast one D2D device which is ready to take part in D2D communication as a relay or source/destination node. We define sending zone as that zone where at least one UE wants to transmit data to an UE of some other zone. If is the sending zone then it may form connection to a UE of another zone , where is the viable relay zones of the zone which is given by the BS. A viable relaying zone of zone is one which is nearer to the zone containing the destination UE and is in the communication range of the zone . When the UE in zone forms a connection with another UE of zone , then it is termed as link . Link is formed between UEs of two zones when they are in communication range of each other and the received signal strength is sufficient for the required data rate. Each UE can communicate with one another on mmWave channels using directional antennas. The received signal strength () on zone from zone is modeled as [7]:



is the shadowing random variable,

is the transmit power of UE , & are transmit and receive beam-forming gains respectively.

is the distance dependent path loss function.

Time is discretized as as shown in figure 2, where belongs to set of nonnegative integers, takes integer values in , is the smaller discretized time slot when the UEs transmit packets locally. It is assumed that (for each ) is large enough to send one packet of size bytes. Here, is the number of time slots (of duration) between two consecutive global decisions by the BS. Global decision by BS is made at time when is divisible by . At this time instant BS takes the channel state information from all UEs in the service region and gives the decision of best relaying UE of a given zone for a given source UE. Hence, in between two consecutive time instants when BS can make global decision, a UE can send at-most packets of size to another UE. Note that at time , the UE chooses the relay link suggested by the BS and at time , BS has no control over the UEs. At global time instants, BS sends two types of information to UEs, i) the best relay UE (or node) for a given source UE and ii) viable relaying zones for given source zone , hence the zone may choose an appropriate zone for relaying data from the set .

There are static and dynamic obstacles in the service region. There is no facility like radars (to track them) available at BS. The behavior of dynamic obstacles are not known a priori and need to be learned from the received acknowledgement of sent packets in an on-line fashion. Since mmWaves are highly susceptible to obstacles and suffer from severe penetration loss, we assume that even a single moving or static obstacle may break an already established D2D link and can cause packet loss. It is assumed that the mobility of UEs in a zone for duration do not bring them outside the zone and this do not cause link outage. Hence the only factors responsible for link breakage and packet loss are obstacles and channel condition due to fading.

The source/relay node takes local decision when the current link quality is not good enough and the node locally explores and switches to another one-hop node by incurring penalty. This exploration is done for the zone’s set given by BS to find out the best relaying zone for that time instant. Note that the UE is using directional mmWave antennas for exploring the neighbors and this time is assumed to cause some significant delay with respect to the duration . Here both exploring and packet loss is assumed to consume one time unit . It is assumed that the relay link is established within this exploration time.

Fig. 1: Service region divided into zones along with dynamic obstacles.
Fig. 2: Discretized time slots with the smallest slot duration of .

Iii Problem Formulation as POMDP

Zone is the sending zone which contains at least one UE which needs to transmit data to an UE of some other zone . This is termed as a link for the given sending zone . Hence zone may contain a relaying UE or the destination UE. Global decision for the best relay is given by the BS at the time instant to relay data packet till time instant. There are both static and dynamic obstacles present in the environment which causes uncertainty in channel quality. Also, the BS has no direct knowledge of the D2D channel conditions. This might deteriorate the quality of relay link given by the BS. Which might cause packet loss and delay in data transmission. We need to control this packet loss for duration. However, BS do not have control over the data packets sent between time instants and . Hence the node locally needs to select for the best relay zone from given the uncertainty of D2D channels and the current relay link has become bad. Since behavior of channel condition is uncertain and unknown before actually establishing connection and transferring the packets, hence we will formulate this problem as a finite horizon POMDP [14].

For the duration between instants and , the time instants are referred as . Here we want to derive a decision criterion to choose appropriate action (continue with the current relaying zone or explore and switch to some other zone) which lead the system to good state. Good state is defined by the minimum packet loss (in turn delay) considering all the required penalty costs. Hence our objective is to minimize the delay cost incurred due to packet loss while choosing appropriate relays and keeping the exploring and switching cost as low as possible. For our POMDP problem, we will describe state, action, observation, probabilistic structure of the problem, respective costs and cost function in upcoming paragraphs.

For a given sending zone , the state for all its possible relay links is written as . This signifies if relay link is in good () or bad () state for values and respectively. The relay link is in good state when the channel quality is as required and packet is transmitted successfully without getting blocked from obstacles, whereas in bad state the channel quality drops and hence packet loss occurs. The action set is defined as {explore & switch to another link (), transmit on current link (zone) ()}. The local node in zone , makes observation at each smaller time instant after the packet is sent. This observation is in the form of ACK test which is denoted as . Here, represents that the acknowledgement is not received for link because link is bad which causes packet loss and similarly represents that the acknowledgement is received and link is good and packet is transmitted successfully. We also represent and as the ACK received or not ( or ) respectively. Since ACK are quick and are available in negligible amount of time, for state and action , the observation (ACK) is . The ACK may also be uncertain due to the unpredictable behavior of the given channel under consideration.

The probabilistic structure of the observation assumed here is shown in figure 3 and written as:

If the system is in bad state with at time , then the probability of obtaining good observation is zero () which is intuitive and obvious. The probabilistic structure assumed for the system state transition is given as:

Here , and are respectively the probabilities that link is still good, bad link becomes good and the ACK is received successfully when the link is in good state. It is intuitive and legitimate to assume that . The transition probability indicates that the good link becomes bad due to obstacles or signal fading. Similarly is the probability that bad link is still bad (for obstacles it indicates either obstacle is large in length or moving slowly and effecting the link for longer period).

Fig. 3: Probabilistic structure of the problem at a node locally.

For a given relaying zone , let

denote the information vector available locally to the zone

till smaller time instant . Let us define as the conditional state distribution acting as the sufficient statistics or belief [14](chapter 5) locally for the given relaying link as:


This equation signifies the probability that the relaying link is in good state given the previous history information. The estimator function of the local system is given as:


Using Baye’s rule we get,


The cost structure is defined as follows: when packet loss occurs then is the penalty (in terms of delay) incurred to overcome it. Since after a packet loss we may need to explore, hence, this is the same cost for exploration. When there is no packet loss then the cost incurred is . Cost of testing for ACK is negligible and hence . Here our objective is to derive a decision rule to choose appropriate action (continue with the current relaying zone or explore and switch to some other zone) which leads the system to good state and in turn causes minimum packet loss considering all the required costs. The expected cost is formulated as a dynamic program. At the end of the last period i.e., period, the expected cost is defined as:


Note that the for the last time instant , packet loss can be due to two types of events: i) due to the link being in bad state and causing packet loss and ii) when the link is in good state and the ACK is not received due to bad channel quality. For the time instant , we have,


where, is the expected penalty paid due to packet loss at time which is . The first term in minimization expression denotes the exploring & switching cost and the second term denotes the cost for continuing in the current relay link. Similarly we can write the dynamic program for the general expression for each as:


where, is the expected penalty paid due to packet loss at time instant which is . After solving this DP we will get a criterion, based on which the local decision can be made to switch the link or to remain on that link. Hence for a given relay zone at time instant , we want to minimize the cost . The analysis of this criterion is given in the next section where we derive a policy which maps the belief into action. The policy and hence the respective action taken optimize our objective function.

Iv Derivation of the Optimal Policy

Iv-a Properties of

At the end of the period, the expected cost is as mentioned in equation (5). For the general expression for the time instant as mentioned in equation (7), we can write it equivalently as:




For notation simplicity we will now remove the superscript from each of the respective notations, e.g., we will write as and as . Hence can now be denoted as .

We can reduce in equation (9) to:


As an example we will use this expansion to simplify equation (6) as:


At the end of period as shown in above equation (11), the local node has calculated that the relay link is still the good node or not and further decides whether to continue on the already selected relay link or needs to explore and switch to another relay node and incur extra cost . In equation (11), indicates the expected penalty incurred due to packet loss and indicates the expected cost to be incurred at the upcoming time instant .

We now show that functions are piece-wise linear for each in proposition 1.

Proposition 1.

is piece-wise linear and concave in for each .


See appendix. ∎

Proposition 2.


Also, , , .


See appendix. ∎

Iv-B Policy Structure

The structure of an optimal policy for our POMDP problem is provided in the following theorem.

Theorem 1.

The optimal policy for our POMDP problem is a threshold policy. At any time instant , the optimal action is to continue transmission on the current relay link if , and explore and switch to another better relay link if . Also, the threshold is non-increasing in .


See appendix. ∎

As , converges to some scalar , since a decreasing sequence which is bounded below always converges. Hence, for very large horizon length , the optimal policy can be approximated by a stationary threshold policy with a time-invariant threshold .

Note that, if , then . Hence, without loss of generality, let us assume that . If , then . Now, it is easy to check that is a strictly increasing function in . Hence, . Proceeding in this way, we can show that strictly decreases with whenever we observe several successive ACK failures. We can define recursively a probability of getting successive ACK failure as: , , . Let be the smallest integer such that . We can further simplify the stationary threshold policy as follows.

Simplified stationary threshold policy: Let be the smallest integer such that . If there are successive ACK failures, explore and switch to another better relay link, else continue transmission on the current relay link.

V Experiments and Results

V-a Simulation Environment

We have divided the service region of square area into zones in form of grids each of dimension . Each zone have sufficient number of UEs which is enough to form a D2D link with UEs of other zones. In the experiment, is taken to be . Nodes are using directional transmitter and receiver antennas for frequency with and we are considering a scenario where line of sight path loss exponent is

and zero mean log-normal shadowing random variable with standard deviation

[16, 17]. Thermal noise density is and devices are using transmit power. Capacity of each link is , where [18] is bandwidth and

is the received signal to noise ratio. We are assuming fixed packet length of

. There are total static and dynamic obstacles present in the environment, where . Static obstacles are placed uniformly in the service region. Each static obstacle is assumed to be of the dimension of a grid. Hence all communication going via that grid where there is an static obstacle will get blocked. Each dynamic obstacles is moving randomly and independently of each other and following a simple blockage model such that with probability it will block a given link otherwise it will not block the link. We are assuming that a given zone can make connection with another zone out of given at-most neighboring zones surrounding it i.e . We assume a single source-destination pair for simplicity and all other devices in a given zone may act as relay.

We have written our own C++ custom code and run them on a GNU compiler on Intel core machine. We run our experiments for around 10000 runs and take average results per run and per hop for the packet loss per packet delivered and end to end (E2E) delay per packet. Here packet loss per packet delivered is defined as the ratio of packet loss and successfully delivered packets to the destination. E2E delay is the total time (in seconds) to send a packet successfully from source UE to the destination UE ignoring the queuing delays. We are analyzing the results on these parameters with respect to number of dynamic obstacles . We are comparing the results of our proposed approach with metrics: 1) which selects relay link based on received signal strength (RSS Based) and 2) an approach which selects relay link based on maximum overall throughput (ThroughPut Based) [7].

V-B Experimental Results & Analysis

In figure 4, we are comparing the results of packet loss per packet delivered successfully over the number of dynamic obstacles. We can see that as the number of dynamic obstacles is increased the packet loss per packet delivered successfully is also increased. The reason is obvious due to the fact that as the number of dynamic obstacles increases, the chance of getting blocked also increases and hence the packet loss. Our proposed method outperforms other algorithms due to the fact that it learns the quality of the D2D links based on ACK and changes to another better relay when the quality of current D2D link deteriorates.

In figure 5, we are capturing the results of E2E delay per packet over the number of dynamic obstacles. Here also we can see that as the number of dynamic obstacles is increased the delay also increases. This is due to the fact that as the number of dynamic obstacles increases, packet loss increases and hence it causes extra delay. Our proposed method outperforms other algorithms due to the same reason as mentioned in above paragraph.

V-C Discussions

The proposed method can be run on each UE locally to choose an optimal relay at time instants when there is no control of the BS and the D2D channel quality becomes bad. It is evident from the results that as the number of obstacles increases, the packet loss increases rapidly. Since with higher number of the obstacles, the chance of a link to get blocked gets increased. Also the expected number of links getting blocked also increases. However, it might be the case that the number of dynamic obstacles are so large that we may not find any D2D link which is free from the blockage due to obstacles. In this case our algorithm will not give any better links due to the reason that it will not find any link which satisfies the derived threshold policy . In such cases with very dense dynamic obstacles, empirically the packet loss is very negligible but the packet delivered successfully is also very less and hence delay also might increase. In these scenarios, one appropriate solution would be to opt for the relays which are kept at some height above ground or to chose the transmission over traditional micrometer waves of the BS which is less susceptible to the blockage by obstacles.

Fig. 4: Packet loss per packet delivered vs No. of dynamic obstacles
Fig. 5: E2E delay per packet (in seconds) vs No. of dynamic obstacles

Vi Conclusion

The D2D channel quality is not directly visible to the BS, which might give challenge in choosing a relay for D2D communication when the dynamic obstacles are present in the environment. These kind of obstacles causes unpredictable fluctuations to the D2D channel quality. Hence dynamic obstacles are needed to be learned from the channel statistics. We have modeled the problem of relay selection under the presence of dynamic obstacles as a finite horizon POMDP framework at each UE. This captures the uncertainty arising due to dynamic obstacles. Using this model, we have derived an optimal threshold policy for each UE that maps belief to action. We then derived a simple stationary policy which tells the UE to locally decide to either continue on the current relay link or to explore and switch to other relay link after successive ACK failures on the current relay link. Through simulations we show that our approach captures the effects of dynamic obstacles and outperforms other state of art algorithms.


  • [1] G. H. Sim, A. Loch, A. Asadi, V. Mancuso, and J. Widmer, “5g millimeter-wave and d2d symbiosis: 60 ghz for proximity-based services,” IEEE Wireless Communications, vol. 24, pp. 140–145, Aug 2017.
  • [2] H. Zhao, R. Mayzus, S. Sun, M. Samimi, J. K. Schulz, Y. Azar, K. Wang, G. N. Wong, F. Gutierrez, and T. S. Rappaport, “28 ghz millimeter wave cellular communication measurements for reflection and penetration loss in and around buildings in new york city,” in 2013 IEEE International Conference on Communications (ICC), pp. 5163–5167, June 2013.
  • [3] J. Qiao, X. S. Shen, J. W. Mark, Q. Shen, Y. He, and L. Lei, “Enabling device-to-device communications in millimeter-wave 5g cellular networks,” IEEE Communications Magazine, vol. 53, pp. 209–215, January 2015.
  • [4] T. Bai and R. W. Heath, “Coverage and rate analysis for millimeter-wave cellular networks,” IEEE Transactions on Wireless Communications, vol. 14, pp. 1100–1114, Feb 2015.
  • [5] B. Xie, Z. Zhang, and R. Q. Hu, “Performance study on relay-assisted millimeter wave cellular networks,” in 2016 IEEE 83rd Vehicular Technology Conference (VTC Spring), pp. 1–5, May 2016.
  • [6] S. Biswas, S. Vuppala, J. Xue, and T. Ratnarajah, “An analysis on relay assisted millimeter wave networks,” in 2016 IEEE International Conference on Communications (ICC), pp. 1–6, May 2016.
  • [7] N. Wei, X. Lin, and Z. Zhang, “Optimal relay probing in millimeter-wave cellular systems with device-to-device relaying,” IEEE Transactions on Vehicular Technology, vol. 65, pp. 10218–10222, Dec 2016.
  • [8] W. Kim, J. Song, and S. Baek, “Relay-assisted handover to overcome blockage in millimeter-wave networks,” in 2017 IEEE 28th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC), pp. 1–5, Oct 2017.
  • [9] A. Orsino, A. Samuylov, D. Moltchanov, S. Andreev, L. Militano, G. Araniti, and Y. Koucheryavy, “Time-dependent energy and resource management in mobility-aware d2d-empowered 5g systems,” IEEE Wireless Communications, vol. 24, pp. 14–22, Aug 2017.
  • [10] J. Park and R. W. Heath, “Analysis of blockage sensing by radars in random cellular networks,” IEEE Signal Processing Letters, vol. 25, pp. 1620–1624, Nov 2018.
  • [11] D. Singh and S. C. Ghosh, “Network-assisted D2D relay selection under the presence of dynamic obstacles,” CoRR, vol. abs/1907.08500, 2019.
  • [12] M. Abu Alsheikh, D. T. Hoang, D. Niyato, H. Tan, and S. Lin, “Markov decision processes with applications in wireless sensor networks: A survey,” IEEE Communications Surveys Tutorials, vol. 17, pp. 1239–1267, thirdquarter 2015.
  • [13] K. Kaza, R. Meshram, and S. N. Merchant, “Relay employment problem for unacknowledged transmissions: Myopic policy and structure,” in 2017 IEEE International Conference on Communications (ICC), pp. 1–7, May 2017.
  • [14] D. P. Bertsekas, Dynamic Programming and Optimal Control, Vol. I, 4th Edition.
  • [15] M. N. Tehrani, M. Uysal, and H. Yanikomeroglu, “Device-to-device communication in 5g cellular networks: challenges, solutions, and future directions,” IEEE Communications Magazine, vol. 52, no. 5, pp. 86–92, 2014.
  • [16] N. Deng and M. Haenggi, “A fine-grained analysis of millimeter-wave device-to-device networks,” IEEE Transactions on Communications, vol. 65, pp. 4940–4954, Nov 2017.
  • [17] T. S. Rappaport, G. R. MacCartney, M. K. Samimi, and S. Sun, “Wideband millimeter-wave propagation measurements and channel models for future wireless communication system design,” IEEE Transactions on Communications, vol. 63, pp. 3029–3056, Sep. 2015.
  • [18] A. Al-Hourani, S. Chandrasekharan, and S. Kandeepan, “Path loss study for millimeter wave device-to-device communications in urban environment,” in Communications Workshops (ICC), 2014 IEEE International Conference on, pp. 102–107, IEEE, 2014.