Deep Reinforcement Learning for Dynamic Spectrum Sharing of LTE and NR

02/22/2021
by   Ursula Challita, et al.
0

In this paper, a proactive dynamic spectrum sharing scheme between 4G and 5G systems is proposed. In particular, a controller decides on the resource split between NR and LTE every subframe while accounting for future network states such as high interference subframes and multimedia broadcast single frequency network (MBSFN) subframes. To solve this problem, a deep reinforcement learning (RL) algorithm based on Monte Carlo Tree Search (MCTS) is proposed. The introduced deep RL architecture is trained offline whereby the controller predicts a sequence of future states of the wireless access network by simulating hypothetical bandwidth splits over time starting from the current network state. The action sequence resulting in the best reward is then assigned. This is realized by predicting the quantities most directly relevant to planning, i.e., the reward, the action probabilities, and the value for each network state. Simulation results show that the proposed scheme is able to take actions while accounting for future states instead of being greedy in each subframe. The results also show that the proposed framework improves system-level performance.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

05/14/2020

Solve Traveling Salesman Problem by Monte Carlo Tree Search and Deep Neural Network

We present a self-learning approach that combines deep reinforcement lea...
03/22/2018

Deep Reinforcement Learning with Model Learning and Monte Carlo Tree Search in Minecraft

Deep reinforcement learning has been successfully applied to several vis...
08/23/2020

DSP: A Differential Spatial Prediction Scheme for Comprehensive real industrial datasets

Inverse Distance Weighted models (IDW) have been widely used for predict...
12/19/2020

Deep Reinforcement Learning for Joint Spectrum and Power Allocation in Cellular Networks

A wireless network operator typically divides the radio spectrum it poss...
09/05/2017

Knowledge Sharing for Reinforcement Learning: Writing a BOOK

This paper proposes a novel deep reinforcement learning (RL) method inte...
06/15/2020

An online evolving framework for advancing reinforcement-learning based automated vehicle control

In this paper, an online evolving framework is proposed to detect and re...
04/02/2018

Recall Traces: Backtracking Models for Efficient Reinforcement Learning

In many environments only a tiny subset of all states yield high reward....
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Dynamic spectrum sharing (DSS) has emerged as an effective solution for a smooth transition from 4G to 5G by introducing 5G systems in existing 4G bands without hard/static refarming spectrum [1]. Using DSS, 4G LTE [2] and 5G NR [3] can operate in the same frequency band where a controller distributes the available spectrum resources dynamically over time between the two radio access technologies (RATs). For instance, in LTE-NR downlink (DL) sharing, LTE scheduler loans resources during a certain time to NR and NR avoids symbols used in LTE for cell specific signals. Moreover, DSS helps ease the transition from non-standalone 5G networks to standalone 5G. That said, it is important to investigate an effective scheme for the bandwidth (BW) split between LTE and NR to reap the benefits of DSS.

While some literature has recently studied the problem of spectrum sharing between LTE and WiFi (i.e., LTE-unlicensed) [4], NR and WiFi (i.e., NR-unlicensed) [5], aerial and ground networks [6], and radars and communication systems [7], the performance analysis of 4G/5G DSS remains relatively scarce [8]. For instance, an instant spectrum sharing technique at subframe time scale has been proposed [8]. The proposed scheme takes into account several information about the cell, such as the amount of data in the buffer, thus splitting the BW between 4G and 5G in every transmission time internal (TTI). Despite the promising results, this work considers a reactive spectrum sharing approach that does not account for the future network states and thus resulting in performance degradation. On the other hand, in a proactive approach, rather than reactively splitting the BW based on incoming demands and serving them when requested, the network takes into account future states for 4G/5G spectrum sharing thus improving the overall system level performance.

The main contribution of this paper is to introduce a novel model-based deep reinforcement learning (RL) based algorithm for DSS between LTE and NR. The main scope of the proposed scheme is planning in the time domain whereby the controller distributes the communication resources dynamically over time and frequency between LTE and NR at a subframe level while accounting for future network states over a specific time horizon. To enable an efficient planning, we propose a deep RL technique based on Monte Carlo Tree Search (MCTS) [9]. When a model of the environment is available, algorithms like AlphaZero [10] have been used with great success. However, in the case of DSS, the LTE and NR schedulers are part of the environment, and these are not easily modelled. Inspired by the MuZero work [11], we use a learned model of the environment for planning in the time domain. When applied iteratively, the proposed solution predicts the quantities most directly relevant to planning, i.e., the reward, the action probabilities, and the value for each state. This in turn enables the controller to predict a sequence of future states of the wireless network by simulating hypothetical communication resource assignments over time starting from the current network state and evaluating a reward function for each hypothetical communication resource assignment over the time window. As such, the communication resources in the current subframe are assigned based on the simulated hypothetical BW split action associated with maximized reward over the time window. To our best knowledge, this is the first work that exploits the framework of deep RL for DSS between 4G and 5G systems. Simulation results show that the proposed approach improves quality of service in terms of latency. Results also show that the proposed algorithm results in gain in different scenarios by accounting for several features while planning in the time domain, such as multimedia broadcast single frequency network (MBSFN) subframes and diverse user service requirements.

The rest of this paper is organized as follows. Section II presents the system model. Section III describes the proposed deep RL algorithm. In Section IV, simulation results are analyzed. Finally, conclusions are drawn in Section V.

Ii System Model

Consider the downlink of a wireless cellular system composed of a co-located cell operating over NR and LTE serving a set of users. NR and LTE are assumed to operate in the 3.5 GHz frequency band and apply FDD as the duplexing method. We consider a 15 kHz NR numerology and that LTE and NR subframes are aligned in time and frequency. Each RAT, , serves a set of of UEs. The total system bandwidth, , is divided into a set of resource blocks (RBs). Each RAT, , is allocated a set of RBs, and each UE is allocated a set of RBs by its serving RAT .

For an efficient spectrum sharing model of LTE and RAT, one must design a mechanism for dividing the available bandwidth for data and control transmission for each of the RATs. For the control region, we consider the following:

  • LTE PDCCH is restricted to symbols #0 and #1 (if NR PDCCH is present).

  • NR has no signals/channels in symbols #0 and #1.

  • NR PDCCH is limited to symbol 2, assuming that the UE only supports type-A scheduling (no mini-slots).

  • In LTE subframes where no NR PDCCH is transmitted in the overlapped NR slots, LTE PDCCH could span 3 symbols.

For data transmission, a controller decides on the resource split, , between NR and LTE every subframe.

Ii-a Channel Model

We assume the 3GPP Urban Macro propagation model [12] with Rayleigh fading. The path loss between UE at location and its serving BS , , is given by Model1 [13], considering 3.5 GHz frequency band:

(1)

where d is the distance between the UE and the BS in meters. The signal-to-noise ratio (SNR),

of the UE-BS link between UE at location served by RAT over RB will be:

(2)

where is the transmit power of BS/RAT to UE at location over RB and is the total transmit power of BS/RAT to UE location . Here, the total transmit power of RAT is assumed to be distributed uniformly among all of its associated RBs. is the channel gain between UE and BS/RAT on RB at location where is the Rayleigh fading complex channel coefficient. is the noise power spectral density and is the bandwidth of an RB . Therefore, the achievable data rate of UE at location associated with RAT can be defined as:

(3)

Ii-B Traffic Model

We assume a periodic traffic arrival rate per UE with a fixed periodicity and a fixed packet size . Time domain scheduling is typically governed by a scheduling weight whereby a high weight corresponds to a high priority for scheduling that particular UE. We adopt a similar mechanism for measuring the quality of bandwidth splits between LTE and NR where a UE not fulfilling its QoS is associated with a high weight. The weight for user in subframe can be calculated as:

(4)

where is the time the oldest packet has been waiting in the buffer, and correspond to the step delay and step weight of the delay weight function of user , respectively and is a small positive factor that makes the weight non-zero when there is data in the buffer. Note that a UE with zero weight will not be scheduled. Here, the step delay corresponds to the maximum tolerable delay in order to maintain QoS. If a packet remains in the buffer for a time period larger than , the weight for user increases by .

Given this system model, next, we develop an effective spectrum sharing scheme that can allocate the appropriate bandwidth to each RAT, at a subframe time scale, while accounting for future network states.

Iii Deep Reinforcement Learning for Dynamic Spectrum Sharing

In this section, we propose a proactive approach for DSS enabling LTE and NR to operate on the same BW simultaneously. In this regard, we propose a deep RL framework that enables the controller to learn the BW split between LTE and NR during subframe while accounting for future network states over a time window . To realize that, first, we propose the adopted RL algorithm for training the controller to learn the optimal policy for BW split. Then, we introduce the RL architecture and components for the DSS problem.

Iii-a Deep RL Algorithm

To enable a proactive BW split between LTE and NR, we adopt in this paper the MuZero algorithm [11]. One of the main challenges of the proposed solution technique is that it requires a model for the individual schedulers for LTE and NR, which is hard to devise. Instead, we propose in this paper to learn the scheduling dynamics via a model-based reinforcement learning algorithm that aims to address this issue by simultaneously learning a model of the environment’s dynamics and planning with respect to the learned model [11]. This approach is more data efficient compared to model-free methods where current state-of-the-art algorithms may require millions of samples before any near-optimal policy is learned.

During the training phase of the proposed algorithm, the prediction comprises performing a MCTS over the action space and over the time window

to find the sequence of actions that maximizes the reward function. MCTS iteratively explores the action space, gradually biasing the exploration towards regions of states and actions where an optimal policy might exist. To enable our model to learn the best explored sequence of actions for each network state, we define three neural networks - the representation function (

), dynamics function (), and prediction function (). The motivation for incorporating each of these neural networks in the proposed algorithm is described as follows:

  • A representation function () encodes the observation in subframe into an initial hidden state ().

  • A dynamics function () computes a new hidden state () and reward () given the current state () and an action ().

  • A prediction function () outputs a policy () and a value () from a hidden state ().

During the training phase, the model predicts the quantities most directly relevant to planning, i.e., the reward, the action probabilities and the value for each state. The proposed training algorithm is summarized in Algorithm 1 and the main steps are given as follows:

  • Step 1: The model receives the observation of the network state as an input and transforms it into a hidden state ().

  • Step 2: The prediction function (), is then used to predict the value

    and policy vector

    for the current hidden state .

  • Step 3: The hidden state is then updated iteratively to a next hidden state by a recurrent process consisting of steps, using the dynamics function (), with an input representing the previous hidden state and a hypothetical next action , i.e., a communications resource assignment selected from the action space comprising allowable bandwidth splits between LTE and NR.

  • Step 4: Having defined a policy target, reward and value, the representation function (), dynamics function (), and prediction function (

    ) are trained jointly, end-to-end by backpropagation-through-time (BPTT).

Meanwhile, the testing algorithm refers to the actual execution of the algorithm after which the weights of (), (), and () have been optimized and is implemented for execution during run time. Given that DSS is performed on a 1 ms basis, it is too demanding to run MCTS online. As such, we use the representation () and prediction () functions only during test time. The main steps performed by the controller at test time are summarized in Algorithm 2.

Input: Representation (), dynamics () and prediction () functions.
for  = 0 …  do
       Data Generation
       Step 1: Sample environments () with random parameters.
       for ( do
             for  = 0 …  do
                    Step 2: Encode the observation into an initial hidden state, .
                    Step 3: Run MCTS simulations from this state using () and ().
                    Step 4: Sample an action to take in the environment.
                    Step 5: Store the policy () and reward () in the replay buffer ().
             end for
       end for
       Neural Network Training
       for  = 0 …  do
             Step 6: Sample a batch of sequences from the replay buffer .
             Step 7: Compute the total discounted reward () over each sequence.
             Step 8: Take a training step using BPTT to make , , .
       end for
end for
Algorithm 1 Training phase
Input: Representation () and prediction () functions.
for  = 0 …  do
       Step 1: Encode the observation into an initial hidden state, .
       Step 2: Calculate the action probabilities using () and select the best action.
       Step 3: Find the BW split to use and send that to the schedulers.
end for
Algorithm 2 Execution phase

Iii-B Deep RL Components

In this subsection, we define the RL framework components, namely the observations, actions, and rewards.

  • Action space: BW split between LTE and NR for DL transmission for subframe , denoted as where is the size of the action space. Here, an action corresponds to a horizontal line splitting the BW on one side to LTE and the other side to NR. The possible BW splits are chosen by grouping a set of multiple RBs thereby resulting in a quantized action set. This would in turn reduce the action space size and is valid due to the fact that the gain between bandwidth splits from consecutive RBs is negligible.

  • Observation: the observation for subframe , denoted as , is divided into two parts, where the first part, (), consists of components with size whereas the second part, (), consists of components with size , where is the time window consisting of a set of future subframes. The different observations components are summarized as follows:

    • NR support: a vector with x1 elements that indicates if a user is NR user or not.

    • Buffer state: a vector with x1 elements containing the number of bits in the buffer of user .

    • MBSFN subframe: a matrix with x elements that indicates for each subframe , , if a UE is configured with MBSFN or not. By configuring LTE UEs with MBSFN subframes, some broadcast signalling can be avoided at the cost of decreased scheduling flexibility.

    • Predicted number of bits per PRB and TTI for each UE : a matrix with x

      elements, where each element contains the estimated number of the average bits that can be transmitted for user

      in subframe , taking into account the estimated channel quality of user during subframe , .

    • Predicted packet arrivals: a matrix with x elements indicating the number of bits that will arrive in the buffer for each user over a set of future subframes .

  • Reward function: the reward function is modelled as a summation of the exponential of the most delayed packet per user and can be expressed as:

    (5)

    where is the delay weight function of user in subframe , as described in (4). The intuition behind this reward function is that high total weight is penalized with a low reward in subframe . Meanwhile, if the controller manages to keep the user buffers empty, the reward per subframe will be one. If a highly prioritized UE is queued for several subframes, its weight will increase and thus the reward will approach zero.

Figure 1: A schematic illustration of the proposed setup summarizing the connection between the controller, network state, and LTE and NR schedulers.

Figure 1 summarizes the relationship between the network state, controller, and LTE and NR schedulers. At each subframe, the LTE scheduler, NR scheduler, and controller receive the network state information. This information is then used by the controller to generate observations and thus take an action for the BW split between LTE and NR. This action is then conveyed to the LTE and NR schedulers. Given the network state information and the corresponding BW split, each of the schedulers allocates their respective users to the corresponding BW portion for the current subframe. Finally, the weights for the users are fed to the controller and used as an input for the calculation of the reward. Next, we provide simulation results and analysis for the proposed RL framework.

Parameter Value
Frequency 3.5 GHz
Bandwidth 25 PRBs (5 MHz)
Traffic Model Periodic
UE speed 3 m/s
Transmit power 0.8W/PRB
Noise power () 112.5 dBm/PRB
Antenna config 1 Tx, 2 Rx
Table I: Simulation parameters for the radio environment.

Iv Simulation Results and Analysis

In this section, we provide simulation results and analysis for the performance of the proposed algorithm under four different scenarios where planning in the time domain for dynamic spectrum sharing is relevant. Tables I and II provide a summary of the main simulation parameters.

Figure 2: Neural network architectures for a) the representation function, b) the dynamics function and c) the prediction function.

The structure of the representation, dynamics, and prediction neural networks is depicted in Figure 2

. All dense layers except for the output layer use 64 activations with ReLU activation. The representation outputs (

) use 10 activations with activation. The reward () and value () outputs are scalar with linear activation, and the policy () has the same number of activations as the number of actions with activation.

Parameter Value
Number of MCTS simulations () 64
Episode length () 16 subframes
Discount factor () 0.99
Window size (T) 10 subframes
Batch size 32 examples
Number of unroll steps () 3
Number of TD steps () 16
Optimizer Adam
Learning rate
Number of episodes per iteration () 100
Representation size 10
Table II: Simulation parameters for the RL framework.
Input: Observation, .
Output: Action, .
Step 1: Calculate the weight of each user according to Eq 4.
Step 2: Sort the users in order of decreasing weight.
Step 3: Schedule users from the list until spectrum is full.
Step 4: Check how many PRBs are needed for NR and LTE users.
Step 5: Select the action, , that splits the BW proportionally between the RATs.
Algorithm 3 Baseline Algorithm

Next, we provide a detailed description for the simulation results and analysis of each of the four studied scenarios. Note that in all of the scenarios, the episode length is 16 and thus the evaluation score for a perfectly solved scenario is also 16. Moreover, we assume that LTE users (if any) are scheduled on the lower part of the spectrum band and NR users (if any) are scheduled on the high part of the band. As for the baseline, we split the available spectrum proportionally to the number of required RBs between LTE and NR users, as summarized in Algorithm 3. We also compare the performance of the proposed algorithm to equal BW split and alternating BW between LTE and NR. The user weight is calculated using Eq. 4, with and for all users. The step delay, , is set appropriately for the different users in the different scenarios as specified below.

Iv-a Scenario 1: MBSFN subframes

LTE requires CRSes to enable demodulation of data. Therefore, if only NR UEs are scheduled, the CRSes are not needed and are hence an overhead. If there is a lot of NR traffic to be scheduled, LTE can be configured with so called MBSFN subframes. In these subframes, no CRSes are transmitted and it is therefore not possible to schedule LTE users but this can result in improved efficiency for NR users. This scenario aims to investigate if the controller can learn to account for MBSFN subframes during planning thus enabling time critical LTE traffic to be served before MBSFN subframes.

Iv-A1 Scenario description

We consider two users, one NR user and one LTE user, both having a traffic arrival periodicity of 4 ms and a step delay =3ms. The packet size is 45000 bits and 15000 bits for the NR and LTE users, respectively. The system is configured with a repeating MBSFN pattern with a periodicity of 4 subframes, where the first two subframes are non-MBSFN (i.e., both LTE and NR UEs can be scheduled) and the last two subframes in the pattern are MBSFN subframes (i.e., only NR UEs can be scheduled).

Iv-A2 Optimal bandwidth split

To solve this scenario optimally, both packets must be served within 3 ms. As such, the LTE user should be served in the non-MBSFN subframes to make resources available for the NR user later in the cycle. Therefore, the optimal strategy is to start scheduling LTE such that its buffer is emptied before the MBSFN subframes.

Iv-A3 Results and analysis

From Figure 3, we can see that the proposed algorithm converges to the optimal strategy in 12 iterations. Also, note that the performance of the proposed scheme exceeds that of equal bandwidth split between LTE and NR and the case where MBSFN subframes are allocated to the NR user and non-MBSFN subframes are allocated to the LTE user. With this MBSFN configuration the amount of overhead due to e.g. LTE CRSes can be minimized which results in improved efficiency on network level. The controller can learn to account for the MBSFN subframes by scheduling in such a way that maximizes the quality of service despite the reduced scheduling flexibility due to the MBSFN subframes.

Figure 3: Evaluation score as a function of number of iterations for scenario 1 with MBSFN subframes.

Iv-B Scenario 2: Periodic high interference

In this scenario, we investigate the controller’s ability to learn to account for future high interference on one of the users during planning. Periodic high interference can, for instance, occur in case a user is at the cell edge and is interfered by another base station or in case of unsynchronized time division duplexing scenarios.

Iv-B1 Scenario description

We consider two users, one NR user and one LTE user, both having a traffic arrival periodicity of 2 ms. We assume a larger packet size for NR user compared to that of LTE so that we can observe the gain of NR benefiting from the 2 extra symbols of LTE PDCCH if it is allocated the full bandwidth. Users have a small weight value when the delay is less than 2 ms but then it increases abruptly to 2 after 2 ms (i.e., ). Moreover, a periodic high interference is observed on LTE user every 3 subframes. Here, the periodic interference term is added artificially for analysis purposes.

Iv-B2 Optimal bandwidth split

The optimal strategy for this scenario is to allocate the full bandwidth to NR during subframes with high interference on the LTE user.

Iv-B3 Results and analysis

From Figure 4, we can see that the proposed algorithm converges to the optimal strategy in 18 and 28 iterations for the case of 2 and 3 action space, respectively. The proposed approach outperforms the baseline algorithm, equal bandwidth split, and alternating bandwidth split where the controller learns to allocate the full bandwidth to NR during subframes with high interference for the LTE user as opposed to taking actions based on buffer status only. This allows the controller to split the bandwidth between LTE and NR such that the impact of the interference level from neighboring cells is reduced thus resulting in an improved system level performance.

Figure 4: Evaluation score as a function of number of iterations for scenario 2 with periodic high interference.

Iv-C Scenario 3: Mixed services

In this scenario, we investigate the controller’s ability to handle users with different delay requirements.

Iv-C1 Scenario description

We consider two users, one high priority NR user () with 90000 bits, and one low priority LTE user () with 90000 bits. Data arrives in subframe 1 for both users.

Iv-C2 Optimal bandwidth split

The optimal strategy for this scenario is to postpone the scheduling of the low priority LTE user in order to allow the high priority NR user to be scheduled. When the buffer of the high priority users is emptied, the controller can schedule the LTE user.

Iv-C3 Results and analysis

From Figure 5, we can see that the proposed approach converges to the optimal policy within 5 iterations. The controller learns to prioritize the NR user with a tight delay requirement over the LTE user thus outperforming the baseline algorithm as well as the equal BW split.

Figure 5: Evaluation score as a function of number of iterations for scenario 3 with mixed services.

Iv-D Scenario 4: Time multiplexing

In this scenario, we investigate the controller’s ability to learn to do time multiplexing (as opposed to frequency multiplexing) between LTE and NR. Time multiplexing can result in two extra symbols for NR when no LTE is scheduled due to the fact that no LTE PDCCH needs to be transmitted. This in turn results in an increased efficiency when the RATs are scheduled in a time multiplexed fashion.

Iv-D1 Scenario description

We consider two users, one NR user and one LTE user, both having a traffic arrival periodicity of 2 ms. The packet size for the NR user is larger (14000 bits) compared to that of the LTE user (10000 bits). Users have a small weight when delay is less than 2 ms but then increases abruptly to 5 after 2 msec (i.e. ).

Iv-D2 Optimal bandwidth split

The optimal strategy for this scenario is to perform time multiplexing whereby the full bandwidth is allocated to a particular RAT every other subframe. As such, NR could benefit from the 2 extra symbols of LTE PDCCH when it is given the full bandwidth. This results in a larger transport block size and thus the large NR packet size can be served within one subframe.

Iv-D3 Results and analysis

From Figure 6, we can see that the proposed approach converges to the optimal action strategy within 14 and 15 iterations for the case of 3 and 4 actions, respectively. The proposed approach outperforms the baseline algorithm, equal BW split, and alternating BW split where the network learns to perform time multiplexing between LTE and NR resulting in an increased spectrum efficiency. For the studied scenario, when NR is scheduled alone, i.e. without overhead from LTE PDCCH, the maximum transport block size is 14112 bits. On the other hand, when LTE is scheduled with NR, there is an extra overhead for LTE PDCCH and thus the maximum transport block size decreases to 12576 bits. Consequently, the NR packet can be scheduled in one subframe given that NR is scheduled alone during that subframe.

Figure 6: Evaluation score as a function of number of iterations for scenario 4 considering a time multiplexing scenario.

V Conclusion

In this paper, we have proposed a novel AI planning framework for dynamic spectrum sharing of LTE and NR. Results have shown that the controller can split the bandwidth between LTE and NR in an intelligent way while accounting for future network states, such as MBSFN subframes and high interference level, thus resulting in an improved system level performance. This gain comes from the fact that the proposed algorithm uses knowledge (or beliefs) about future network states to make decisions that perform well on a longer timescale rather than being greedy in the current subframe. As part of future work, we aim to further investigate if the suggested algorithm can learn to account for uncertainties in the observations.

References

  • [1] Ericsson, “Sharing for the best performance - stay ahead of the game with ericsson spectrum sharing,” Ericsson white paper, 2019.
  • [2] E. Dahlman, S. Parkvall, and J. Skold, 4G: LTE/LTE-Advanced for Mobile Broadband, 1st ed.   USA: Academic Press, Inc., 2011.
  • [3] E. Dahlman, S. Parkvall, and J. Skold, 5G NR: The Next Generation Wireless Access Technology, 1st ed.   USA: Academic Press, Inc., 2018.
  • [4]

    U. Challita, L. Dong, and W. Saad, “Proactive resource management for LTE in unlicensed spectrum: A deep learning perspective,”

    IEEE transactions on wireless communications, vol. 17, no. 7, pp. 4674–4689, July 2018.
  • [5] 3GPP TR 38.889, “Study on NR-based access to unlicensed spectrum.”
  • [6] U. Challita, W. Saad, and C. Bettstetter, “Interference management for cellular-connected UAVs: A deep reinforcement learning approach,” IEEE Transactions on Wireless Communications, vol. 18, no. 4, pp. 2125–2140, March 2019.
  • [7] A. Khawar, A. Abdelhadi, and C. Clancy, Spectrum Sharing Between Radars and Communication Systems.   Springer, 2018.
  • [8] S. Kinney, “Dynamic spectrum sharing vs. static spectrum sharing,” RCR wireless, March 2020.
  • [9] C. B. Browne, E. Powley, D. Whitehouse, S. M. Lucas, P. I. Cowling, P. Rohlfshagen, S. Tavener, D. Perez, S. Samothrakis, and S. Colton, “A survey of monte carlo tree search methods,” IEEE Transactions on Computational Intelligence and AI in Games, vol. 4, no. 1, pp. 1–43, 2012.
  • [10] D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, and D. Hassabis, “Mastering chess and shogi by self-play with a general reinforcement learning algorithm,” 2017.
  • [11] J. Schrittwieser, I. Antonoglou, T. Hubert, K. Simonyan, L. Sifre, S. Schmitt, A. Guez, E. Lockhart, D. Hassabis, T. Graepel, T. Lillicrap, and D. Silver, “Mastering atari, go, chess and shogi by planning with a learned model,” arXiv:1911.08265, Nov. 2019.
  • [12] 3GPP, “Study on channel model for frequencies from 0.5 GHz to 100 GHz,” 3GPP TR 38.901, V15.0.0, June 2018.
  • [13] 3GPP, “Evolved Universal Terrestrial Radio Access (E-UTRA); Further advancements for E-UTRA physical layer aspects,” 3rd Generation Partnership Project (3GPP), Technical Report (TR) 36.814, 03 2017, version 9.2.0.