I Introduction
Efficient allocation of system resources among the users of a network is the key to improved system performance. The resource allocation refers to, for example, the allocation of resource blocks in time or frequency domain, the allocation of transmit power, and/or modulation and coding scheme for payload transmission, etc. to a specific user in the system. The task of determining the appropriate resource allocation relies on the information about the propagation environment acquired by the transmitter(s), as well as on the available computational power in the system. Traditionally, the users’ channel state information (CSI) available at the transmitters is utilized for resource allocation, where heuristic approaches can be used to optimize the practical implementation
[1]. The CSI acquisition contributes to a performance overhead, which varies depending on the system’s user density. The last few decades have seen a surge in the user traffic demands, along with immensely increased user density, in a wireless communication system. This implies that instantaneous CSI acquisition will incur significant overhead for such systems with high user density where the channel varies frequently due to the propagation environment and user mobility. Furthermore, high user density leads to increased computational complexity for efficient resource allocation. This arises a need for alternate approaches for resource allocation, such that the overhead for acquiring the propagation environment’s information is minimal and at the same time, the required computational complexity to optimize the resource allocation is within certain constraints.According to a recent survey [2], several research works in the last decade have focused on the idea of applying various machine learning frameworks for resource allocation in wireless systems. In addition to the related work mentioned in [2], the work in [3]
applies deep neural network for cognitive radio modulation recognition. The authors in
[4]combine unsupervised feature learning with supervised classification techniques to design an efficient qualityofexperiencebased video admission control and resource allocation scheme. Deep reinforcement learning is used in
[5] to propose a powerefficient resource allocation scheme for wireless system based on cloud radio access network architecture. All these works highlight the advantages of exploiting machine learning for resource allocation in wireless systems, without compromising the computational capacity of the system.Besides relying on CSI for optimizing the performance of wireless systems, various research works propose the use of position information in this context. Majority of these works discuss positionbased schemes for improving the system performance at different layers of the protocol stack [6, & references therein], [7], but only a handful of those consider position information for resource allocation. It should be noted that the position information can be acquired through narrowband uplink pilots, as opposed to fullband pilots used for CSI acquisition, resulting in a considerably lower overhead. [8] provides an overview of the positionbased resource allocation to improve the performance of wireless systems, stressing the fact that the true potential of positionbased techniques needs to be evaluated by comparison with the CSIbased methods. In [9], positionrelated information (i.e. angleofarrival) of mobile terminals is utilized for spatial filtering in an ultradense network to maximize throughput. A similar approach is used in [10], where the authors propose locationaided beamforming by exploiting the distance between the base station and the mobile relay for beam selection to serve highspeed users. Other works like [11] and [12] consider locationbased resource allocation in conjunction with CSI. To the best of our knowledge, none of the previous works have considered resource allocation solely based on position coordinates of the mobile terminal. Furthermore, most of the above studies do not account for stochastic variations affecting the positionrelated information, rather assume that the position is perfectly known without considering estimation errors. Our recent work [13] presents some initial investigation results about the feasibility of a coordinatesbased resource allocation scheme, but the main question about the associated implementation constraints in realtime system is what we address in this work.
We focus on the simplistic system comprising a single transmitter serving a single mobile terminal, with dominant lineofsight (LoS) communication link. This implies that the channel characteristics of the propagation environment of the mobile terminal is related to its position information. We apply supervised machine learning framework to learn this relationship and design a coordinatesbased resource allocation scheme that maximizes the transport capacity of the system. This study extends our previous work [13] to comprehensively address the applicability of different machine learning frameworks to model and perform coordinatesbased resource allocation. More specifically, the main contributions of our work are as follows:

We present a detailed description of the coordinatesbased resource allocation scheme through machine learning proposed in [13], and discuss the different possibilities for dataset formulation and the associated challenges.

We investigate the applicability of the proposed coordinatesbased resource allocation scheme using the different dataset formulations. Based on the best possible choice for dataset formulation, we investigate the performance results of the proposed scheme with respect to the stochastic variations in system characterization.

In terms of implementation constraints, we present an analysis of the time required to train the proposed coordinatesbased resource allocation scheme through machine learning. Particularly, we consider realistic system simulation with correlated channels, and determine the training time necessary for the proposed scheme to achieve a system performance comparable to the CSIbased resource allocation scheme.
The rest of the paper is structured as follows: Section II describes the system model considered in this work, while Section III discusses the design of the proposed resource allocation scheme, along with the details of the machine learning frameworks used in this work. The different dataset formulations together with their analysis are discussed in Section IV. Section V presents the results and relevant discussions, along with the analysis of the training time required for realtime implementation of the proposed resource allocation scheme. Section VI concludes the paper.
Ii System Model
In this section we first describe the basic system model, followed by a formal definition of the resource allocation problem. This problem definition is based on an optimization function which aims at maximizing the transport capacity of the system.
Iia Basic System Model
We focus on the downlink communication between a single base station (BS), and a single user terminal. We assume that the system operates on time frames of duration [ms]. OFDM waveform is considered for payload transmission, with the BS utilizing a bandwidth spanning a number of subcarriers . The BS is equipped with transmit antennas, collectively operated with a constant transmit power denoted by [W]. The user terminal is assumed to have receive antennas.
We consider a timevarying wireless communication channel between the BS and the terminal, subject to pathloss, shadowing and fading. Let us denote by the MIMO channel matrix between all transmit and receive antenna elements at time and for subcarrier . We assume that the MIMO channel stays constant during a single time frame and that the user terminal is not exposed to any interference. With these assumptions in mind, if the BS applies a transmit beam at time to transmit the symbol over subcarrier , the received signal at the terminal, when the terminal applies a receive filter , will be given by:
(1) 
where represents the additive white Gaussian noise, and represents the Hermitian of . Based on the received signal in (1
), the signaltonoise ratio (SNR) for the terminal served by BS at time
for subcarrier , with noise power , is given by:(2) 
For the transmission of payload to the receiver, the BS applies a modulation and coding scheme (MCS) over all the subcarriers. The chosen MCS is applied uniformly for all the subcarriers, meaning that the system does not feature adaptive MCS per subcarrier. We assume a fullbuffer state at the BS with respect to the terminal. Therefore, the payload transmission based on the choice of MCS carries a maximum possible number of bits, denoted by . Due to noise, the transmitted bits can be received erroneously at the terminal, which can be modelled by an error function . This results in transport capacity (or goodput) at the terminal, by which we measure the performance, and is given by:
(3) 
IiB Resource Allocation and the Position Information
We consider the maximization of transport capacity as the optimization problem in this work. The transport capacity at the terminal depends on the allocation of available system resources between the BSterminal pair. We denote the resource allocation by , which comprises the following: (a) the transmit beam applied by the BS, (b) the receive filter applied by the terminal, and (c) the MCS chosen by the BS. The transmit beam and the receive filter is chosen from the finite sets and , respectively. The sets and are predetermined using geometric beamforming, with an angular separation , at both the BS and the user terminal, respectively. The MCS value is applied from the finite set , that is based on the linktosystem mapping given in [14]. Since the downlink communication between a single BS and user terminal is the focus of our work, we allocate all the time and frequency resources, as well as the full transmit power, to the single terminal. Based on the above description, the optimization problem for resource allocation per downlink frame can be stated as:
(4) 
Traditionally, the resource allocation problem is solved based on the CSI of the BSterminal pair, which requires the CSI to be perfectly known at the BS. This means that the system has to apply fullbandwidth signals, with frequent signaling, but in reality the signaling bandwidth is limited and signaling resources are scarce. Due to this limitation and scarcity, CSI estimation would result in a lower transmission rate than required per user, in addition to an outdated CSI information. An outdated CSI results in a highly inefficient resource allocation and, consequently, a deterioration of the system performance. This is prevalent in scenarios with many users, and fast channel changes due to user mobility and high density of scatterers. To mitigate these problems, an alternate approach is needed for efficient resource allocation that maximizes the system performance.
In this work, we consider systems where an estimate of the terminal’s position can be obtained at the BS in addition to CSI acquisition. This position estimate can be determined, for example, by Kalman filtering of the directionofarrival and timeofarrival of the specifically sent positioning beacons in the uplink
[9]. These positioning beacons are in fact narrowband signals, which pose significantly lesser overhead compared to CSI estimation beacons for high user density scenarios, as mentioned in [15]. Let denote the true position coordinates of the terminal at time , while denote the estimate of position coordinates of the terminal. Assuming that dominant LoS link exists between the BS and the terminal, with no interference exposed at the terminal, the channel characteristics remain fairly constant, and are related to the terminal’s position estimate known at the BS. However, this relationship is affected by the fact that the position of terminal can not be accurately known at the BS at all times, as well as by the presence of scatterers in the propagation area. In this work, we first investigate under which conditions a relationship between the position estimate of terminal and the channel state exists, and if so, how can this relationship be exploited to maximize the transport capacity ? This question forms the basis of our research problem, which is presented in the next subsection.IiC Problem Statement
As mentioned before, maximum transport capacity is achieved when resource allocation is determined optimally based on the perfectly known CSI at the BS. However, depending on the propagation scenario, the instantaneous CSI acquisition can be costly or the available CSI estimates can be outdated, both of which are detrimental to CSIbased resource allocation. For the propagation scenarios where dominant LoS exists between the BSterminal pair, the relationship between the estimated position coordinates of the terminal and the downlink channel state can be exploited to determine the resource allocation for BSterminal pair for solving the optimization problem (4). Intuitively, with the estimated position of the mobile terminal available at the BS, geometric beamforming can be applied to determine the resource allocation . This means that the transmit beam and the receive filter are determined based on , whereas the MCS can be determined based on the distance between the BS and the terminal. But this approach suffers from the inaccurate position information availability at the BS from time to time, in addition to being affected by the presence of scatterers in the propagation area. Furthermore, the geometric approach suffers from the antenna radiation profiles and the antenna orientation, at both the BS and the terminal. In this work, we apply supervised machine learning to design a coordinatesbased resource allocation scheme to solve the capacity maximization problem. In particular, we discuss how supervised machine learning can be used to determine the resource allocation from terminal’s position estimates , and how will such a coordinatesbased resource allocation scheme be implemented in a wireless communication system. Furthermore, we will investigate the computational cost associated with the implementation of the proposed scheme in a realistic system setup.
Iii Coordinatesbased Resource Allocation Using Machine Learning
In this section, we first present the design and working of the coordinatesbased resource allocation scheme. We also discuss some challenges related to the design of the proposed scheme, and the different possible solutions we considered. Afterwards, we outline the different supervised machine learning algorithms used for coordinatesbased resource allocation, and the motivation for their choice.
Iiia Design and Working of the Proposed Scheme
As mentioned before, we apply supervised machine learning framework for designing the coordinatesbased resource allocation scheme. This implies that the samples collected for training the machine learning framework will comprise of input parameters and an output, which will be predicted by the learnt model. Fig. 1 shows the design and working of the coordinatesbased resource allocation scheme. The scheme consists of two modes, namely the trainingbased mode and the positionbased mode. In the beginning, the system operates in the trainingbased mode, where the data collection process happens simultaneously with the CSIbased resource allocation. In this mode, both the estimate of terminal’s coordinates as well as its CSI are collected by the system for a period of time to construct the training samples. This information is processed offline, where the CSI for each collected sample is used to determine the resource allocations that maximize the system’s transport capacity. Once all the samples are processed, the training dataset is formulated by associating each position estimate , the input, with the corresponding resource allocations , the output. These samples are used to train the supervised machine learning frameworks, and the corresponding learning models are then used for prediction by the system.
After the training process is complete, the system operates in positionbased mode. In this mode, only the terminal’s position coordinate for each time frame is assumed to be known. This estimate is then passed to the trained machine learning model for determining the resource allocation based on the model’s prediction. To ensure efficient performance, the predictions from the learnt model need to be checked with the baseline CSIbased resource allocation from time to time. In case the goodput computed by the predicted resource allocations is not inline with the one given by the CSIbased resource allocation, the system switches back to the trainingbased mode to retrain the machine learning model. The exact modelling of this modeswitching process is out of the scope of this work. However, we provide an intuition about the time needed to collect a sufficient amount of training samples, and to train the machine learning model, for a stable learning performance towards the end of this paper.
With respect to the machine learning frameworks, the major design challenge relates to the representation of inputs and outputs in the dataset. In this case, two basic representations for input variables are possible: One is to treat the propagation scenario in the form of a binarycoded image, where the position estimate of the terminal is marked with a 1, while the rest of the image is coded as 0’s. The other approach is to use the coordinates of the estimated terminal’s position as the input vector for learning framework. The imagebased representation is specially suited for neural networkbased learning methods, but due to the fact that the coded vector will be highly sparse, as it will indicate the position estimate of only a single terminal, a huge amount of data samples will be required to train the neural network appropriately. The collection of huge amounts of data samples yields this data representation impractical for implementation. In contrast to the imagebased representation, the other approach uses the estimated coordinates of the terminal itself as input, which are stored as floating point numbers in the system. However, this representation entails a lowdimensional input vector, and therefore, deep learning architectures can not be used with this data representation. We choose the lowdimensional input vector representation for other learning frameworks, due to feasibility of this data representation in a realtime setup. Inspired by this input representation, we choose to encode
as a binary string, where the different parts of the string encode the individual resource variables’ information.Besides data representation, another issue relates to the dataset formulation itself, considering the association between input and output variables, or classes. In our case, such a data formulation is unique in nature since a single position estimate can be associated with multiple resource allocations, or classes, that maximize the transport capacity. In general, the datasets used for machine learning have a unique relationship between inputs and outputs, but this is not the case for our set up. With these restrictions in mind, different methods for dataset formulation can be considered. One formulation is based on designing the dataset for binary classification problem, but this implies that the output variable can take one of the two possible values. This is not possible for the choice of resource allocation as an output variable, and hence, rules out the usage of support vector machines algorithm, which is fundamentally a binary classifier
[16]. In this work, we consider the dataset formulation based on single output variable per sample, where the details of the different possible formulations can be found in Section IVC. With the above choice of data representation and dataset formulation, we now present in detail the machine learning frameworks used in our work.IiiB The Machine Learning Frameworks
The supervised machine learning domain mainly comprises the following wellknown algorithms, ranging from lowest to highest possible complexity: Knearest neighbor (KNN), support vector machines, random forest (RF) and neural network. Based on the representation choice of the input and output variables in the dataset formulation, we use Knearest neighbors and Random Forests algorithm as the machine learning frameworks in our work. The details of KNN and RF algorithms are mentioned below.
IiiB1 KNearest Neighbor Algorithm
KNN is the simplest machine learning framework [17] and does not build an explicit model for predicting the classes. Instead, the whole training dataset itself is used for class prediction. For a given sample of the test dataset, the KNN first determines the K samples in the training data that are closest to the test data sample. Then it performs a majority vote on the class associated with the K nearest neighbors to predict the outcome for the given test data sample. In the context of the coordinatesbased resource allocation, given a training dataset with sufficient sampling of the terminal’s coordinates, the KNN can capture the spatial relationship between the terminal’s position coordinates and the respective resource allocation quite accurately.
IiiB2 Random Forest Algorithm
Random Forest [18]
is a complex supervised learning algorithm, which builds a learning model for class prediction, as opposed to KNN. The model consists of an ensemble of randomized binary decision trees, where each decision tree is constructed using a dataset consisting of samples taken from sampling with replacement on the available training data. An individual sample in the training data is called an input feature vector, and consists of the input features
along with the output variable(s). Each decision tree in the forest consists of a root node, several interior nodes and terminal leaf nodes. The thresholds for each node are determined based on a subset of randomly selected input features , which induces randomness in each decision tree of the RF model. Overall, number of trees are constructed, where each tree is either grown to a maximum depth of or till all the classes are perfectly separated. The predicted class is based on a majority vote on the classes predicted by all the trees in the forest.For the coordinatesbased resource allocation, whenever a new estimate of terminal’s position is available to the RF model, it is parsed through all the trees in the forest. The resource allocation prediction is made by taking the mode of the resource allocations, i.e. the classes, predicted by all the trees in the forest. The ensemble of decision trees provides robustness to the learnt model, and therefore, RF is robust to the noisy inputs compared to other machine learning frameworks. This property makes the choice of RF even more attractive for the cases where noisy estimates of terminal’s position coordinates are available for determining the resource allocation that maximize the system performance. The randomness introduced by random selection of input features for constructing an individual tree prevents the RF algorithm from overfitting on the training dataset. Due to these reasons, RF algorithm is expected to perform better than KNN, typically for the cases when erroneous estimates of terminal’s coordinates are available or when the propagation scenario involves randomness in the channel between BSterminal pair.
This section discussed the design of coordinatesbased resource allocation scheme, along with the challenges involved in designing the proposed scheme tailored for supervised machine learning frameworks. Since resource allocation problem is a multiclass classification problem, therefore, support vector machines is not considered in this work for machine learning. Furthermore, based on the input data representation, we rule out the usage of neural network for designing the proposed scheme. In the next section, we discuss the specific models used to generate the training datasets, followed by the details of different dataset formulations used in our work.
Iv The Datasets
In this section, we will first mention the different models considered for simulating the propagation scenario, specific to the system model described in Section II. Next we present the methodology for simulating the channel model to generate the datasets. Afterwards, we discuss the different dataset formulations to solve the coordinatesbased resource allocation problem using machine learning. At the end, we will present the analysis of datasets for some baseline propagation scenarios.
Iva Scenario Description
We consider a small street section of m with a single BS serving a single mobile terminal as the propagation scenario, shown in Fig. 2
. The BS is placed at the right lower end of the street, 3 m offroadside. The mobile terminal is placed randomly over the street (randomdrop), with uniform distribution over the entire street section. We define a parameter
[/m] to specify the maximum density of scatterers in the considered propagation scenario. As an example, if 0.05/m, up to 5 scatterers will be randomly placed in the propagation environment. The placement as well as the number of scatterers will vary for each randomdrop of the terminal. The BS is equipped with transmit antennas, while the terminal has receive antennas. Each antenna element at both the BS and the terminal is a Hertzian dipole, and forms a uniform linear array oriented along the axis. The BS antenna elements are collectively operated with a power of 1W, and are placed at a height of 10 m from the ground. The terminal antennas are at a height of 1.5 m from the ground, which remains constant for all the simulation scenarios. The system operates on a center frequency of 3.5 GHz over a bandwidth 200 MHz. The transmission time interval of the system is set to 0.2 ms.To calculate the transport capacity in (4), we apply the linktosystem mapping error function for , , that emulates the erroneous reception of the transmitted bits at the receiver. Here, is the exponential effective SNR mapping [19], which converts the SNR value per subcarrier into an equivalent SNR over all the subcarriers, with respect to the considered communication scenario. The linktosystem mapping error function provides an error rate that is specific to a range of values for a given MCS . Hence, the linktosystem model represents the relationship of the block error rate, effective SNR, MCS as well as the transmitted payload size.
In terms of the system resources, the sets of transmit beams and receive filters are determined using geometric beamforming, with an angular separation of 3 and 12, respectively. The finite set of MCS values comprises 15 different values, and is based on the linktosystem mapping in [14]. The optimal resource allocations (RAs) are determined by solving Problem (4) through exhaustive search.
Another important parameter is the error in the position estimate for the system model presented in section IIA. The position estimation error
is modelled as a Gaussian zeromean random variable with variance
. With respect to the communication system, this position error depends on the accurate estimation of the direction of arrival and the time of arrival parameters, where the former depends on the antennas’ geometry, while the latter is related to the pathloss between the BSterminal pair. In addition to the aforementioned modelling parameters, the utmost important is choice of the channel model, which we present in the next subsection.IvB The Channel Model
We resort to simulations for generating the datasets to determine the resource allocation using estimated position coordinates of the terminal to solve the optimization problem (4
). To the best of our knowledge, only the received signal strength related information is available in the publicly accessible traces on various opensource platforms, with no details about the position information of the terminal. Therefore, one of the major challenges for data generation is to choose the simulation models that emulate the realtime measurements as closely as possible. Hence, we utilize the raytracer channel model
[20] to generate the MIMO channel matrix for various parametrizations. This channel model has been validated for different propagation scenarios, as mentioned in [20], and is a stateoftheart channel model for nextgeneration wireless communication systems.The raytracer channel model considers a number of multipath components existing in the downlink communication between the BSterminal pair, for each time and subcarrier . These multipath components arise due to different wave propagation phenomena, including reflection, diffraction and scattering, which are affected by the presence of scatterers in the propagation environment. The radiation patterns of the BS and terminal antennas are also taken into account by the raytracing model. We denote by the impulse response for multipath component , between each BS antenna element and each terminal’s antenna element , which captures all the aforementioned propagation effects in addition to the relevant pathloss. The channel impulse response is then the sum of the impulse responses of all the different multipath components, i.e.
(5) 
Here, is the total number of multipath components, is the wavelength corresponding to the center frequency , and is the frequency of the subcarrier . is the total distance for multipath at time , and denotes the delay for multipath . A detailed implementation of this channel model can be found in [9].
Based on the scenario description and the choice of the channel model, we define the following baseline cases to generate the primary datasets:

Case 1: When no scatterers are present in the propagation environment and accurate position estimates for the terminal are available. This represents a deterministic channel generation, i.e. the channel between a given terminal position and BS always results in the same channel matrix .

Case 2: When erroneous position estimates are available (specifically, 0.4 m) with no scatterers in the propagation scenario.

Case 3: The position estimates of the terminal are known accurately for 0.05/m that are randomly placed in the propagation environment.
We now present the methodology for the formulation of datasets used in this work. Afterwards, we analyze the generated datasets for the three cases mentioned above.
IvC Formulation of the Datasets
We use exhaustive search to determine the optimal resource allocation for each sample in the generated dataset. This implies that for every estimate of terminal’s coordinates , the exhaustive search is performed offline to determine the optimal . Depending on the propagation conditions, for the system model specified earlier, multiple resource allocations can yield the same transport capacity value, which is optimal for the BSterminal pair during time
. As an example, this can occur when different placements of scatterers result in a range of transmission rates that lead to similar error probability for payload transmission. Therefore, for certain scenarios, a vector
results in an optimal value at time , instead of a scalar . This renders the dataset formulation to be unconventional compared to the formulations that are traditionally used for supervised learning. In this work, we formulate the datasets with single output variable per sample, where the following approaches are considered for dataset formulation:
Dataset 1 (): This is the approach we adopted in our previous work [13], where we consider only the first optimal solution for capacity maximization problem using exhaustive search to form the dataset. This means that the output variable is the first resource allocation outcome from exhaustive search that maximizes the transport capacity for a given position estimate. This formulation results in a dataset with unique inputoutput association.

Dataset 2 (): Here, we consider only the resource allocation of the transport capacity maximization problem that relates to the highest value of for formulating the dataset. This also implies unique inputoutput relationship in the dataset used for a learning framework.

Dataset 3 (): In contrast to the above two approaches, we use all possible resource allocations that optimally solve problem (4) for constructing this dataset. This means that the learning algorithm will have to predict a single resource allocation by learning on nonunique relationship between the position estimates and the resource allocations.
Each of these datasets are generated with realistic system assumption, where an estimate of user position as well as its CSI is collected after certain time intervals over a long period of time. This assumption is implemented in simulation by considering random user drops, as mentioned earlier, with uniformly distributed user positions over the entire simulation scenario for a fixed system parametrization. For a given user position, the channel matrix is obtained through the raytracer channel model, which is then used to compute the SNR and the effective SNR values based on the three resource allocation variables, i.e. the transmit beam , the receive filter and the MCS . For all the datasets, a dataset sample consists of an input vector and an output value. The input vector comprises the and coordinates of the terminal’s position estimate (the coordinate is not used due to the same receive antenna height assumption for all the samples). The output value is a number denoting the binary sequence, where the sequence encodes the index of , and corresponding to the resource allocation considered for the specific dataset formulation. Here, a resource allocation denotes a class to be learnt by the learning algorithm. A collection of such samples constitutes the whole dataset , which is then divided into a training dataset and a test dataset for applying machine learning. Based on the dataset formulations outlined before, and have a unique inputoutput association per sample, for both the training and test datasets. For , the input vector is repeated as many times as the optimal resource allocations for a given , to construct data samples with a single output value denoting each of the optimal resource allocations per position estimate. The dataset has a structure similar to , with unique inputoutput association per sample.
The above process is repeated multiple times by setting the system parametrization variables differently, to get various datasets for each formulation with the assumption of accurate user position availability. For the case of erroneous position estimates, the inputs in the datasets for accurate user position are replaced by erroneous position estimates modelled by a zeromean Gaussian with specific variance , while the outputs are kept the same as the ones in the accurate position datasets.
IvD Analysis of the Datasets
After generating the different dataset formulations, we analyze their inputoutput associations to have an intuition about the learning performance on a specific data formulation. A total of 125,000 position estimates are generated for datasets , and , for each of the three cases mentioned in IVB. Two thirds of these position estimates are used for constructing training datasets and their analysis is presented here, while the rest of the samples are used for test dataset construction.
We start by presenting the number of samples per class (a class represents a unique resource allocation) distribution for the different dataset formulations. Note that the number of samples per class are the same for case 1 and 2, since they only differ in the input values. Fig. 3 shows the distribution of the number of samples per class for
, for case 1 and 3, where the xaxis is batched in groups of ten classes for better illustration. We observe an exponential distribution of the number of samples per class, which indicates that the learning can be influenced by a bias in favor of the dominantly occurring classes. We also observe that the number of classes for case 1 with deterministic channel generation (94) is almost
lesser than that for case 3 (254), though being significantly smaller than the total number of generated data samples (125,000). The increased number of classes for case 3 is due to the fact that more scatterers result in more multipath components, which vary with scatterers’ placement, introducing more randomness in the channel. This means that for a fixed user position with different placement of the scatterers, the channel response varies, and therefore, different RAs are obtained as the first outcome from the exhaustive search.Fig. 4 shows the distribution of number of samples per class for . Here again, the xaxis is batched in groups of ten classes for illustration purpose. In contrast to the observation for , an equitable distribution of the number of samples per class exists in . Although the number of classes for case 3 (209) is still that for case 1 (70), however, this is lesser than that observed in due to the consideration of RA related to the highest effective SNR, which implies that the RA does not change significantly based on the random effects present in the wireless channel.
Fig. 5 represents the number of samples distributed per class for . Note that the xaxis is batched in groups of 20 classes for better illustration. Here, we also observe an exponential distribution of the number of samples per class for both the cases, though the number of classes for both the cases in is almost twice that for . This increase is a result of considering all RA outcomes from the exhaustive search that maximize the transport capacity for a given terminal position. The number of such outcomes varies based on the randomness in the propagation scenarios.
After observing the distribution of inputs versus classes for the different dataset formulations, we now analyze the distribution of classes in relation to the input parameters. Fig. 6 presents the distribution of classes, the resource allocations, in relation to the input parameters, the position coordinates of the terminal, for case 1, 2 and 3. Note that each resource allocation is depicted by a unique color, which is consistent across all the cases. For case 1, i.e. the deterministic channel, we see that the different classes are separated quite distinctly over most of the considered area. Some overlap between the classes occurs either in the middle part of the street section or in the lower right corner, which is closer to the BS. The former behavior of class overlap exists due to a higher MCS value maximizing the transport capacity within a specific distance range from the BS, while the latter behavior is a result of the antenna radiation pattern at the BS. Comparing Fig. 6(a) and (b), we observe that the class boundaries become dispersed as the position estimates become erroneous. This dispersion poses a challenge for the learning algorithms in terms of robustness. Considering the channel characteristics, we compare Fig. 6(a) and (c) and notice that the random effects in the propagation scenario do not impact the dataset as severely as the erroneous position estimates do. Though the number of classes in case 3 is thrice as that in case 1, the class boundaries are still distinctly separated across the street section. Therefore, the performance of the learning algorithm can be affected primarily by the erroneous position estimates for , compared to the presence of random scatterers in the propagation scenario.
We now show the distribution of classes with respect to the terminal position estimates for in Fig. 7. Note that we observe the same color coding as in Fig. 6, for better comparison across the two datasets and . The most important observation from Fig. 7 is that the classes are much distinctly separated, with clearly defined boundaries, for case 1. These class boundaries become dispersed due to the erroneous position estimates for case 2, however, the extent of dispersion remains the same as observed for , despite the clear class boundaries observed for case 1. The class boundaries for case 3 become blurred with the introduction of more classes, compared to case 1, but this blurriness is lesser than the one observed for case 2, i.e. for erroneous terminal positions.
The representation of distribution of classes with respect to the terminal positions for is a complex task, since a single terminal position is associated with multiple resource allocations, or classes. One possible illustration is presented in Fig. 8, where a set of all resource allocations associated with a single terminal position is shown as a unique color. Based on this color coding, 1137 unique sets of resource allocations are identified in , and therefore, a color palette of 1137 colors is used for plotting Fig. 8. We observe the same behavior for case 1, 2 and 3, as observed previously for datasets and , with one main difference: The blurriness or dispersion of a class boundary does not necessarily imply a performance loss in terms of transport capacity. This is due to the fact that a different color shows a unique set of RAs, with the possibility of certain RAs being common to both the unique sets. Because of this, we define a new criteria to determine the performance of the learning algorithms based purely on the dataset characteristic, which will be explained in the later sections.
An alternate representation of the distribution of classes in relation to the terminal positions for is shown in Fig. 9. Note that we only illustrate the distribution for three of the resource allocations, i.e. the classes, with respect to the terminal positions, to understand how the dataset is viewed by the learning algorithm. The representation plot shows that the learning algorithm can associate the overlapping class regions with any of the classes for a given terminal position. This implies that the predicted class will result in the same value of maximum transport capacity, as long as the prediction lies within the original and overlapped class boundary. The nonunique association can assist the learning algorithm to be robust for erroneous position inputs, as the dispersed class boundaries will still be lying in the overlapped region and, therefore, result in lesser loss in transport capacity compared to that for datasets and .
In this section we presented the different dataset formulations for supervised machine learning to perform coordinatesbased resource allocation. Analysis of the different dataset formulations shows that strong spatial clustering exists, which supports the feasibility of coordinatesbased resource allocation through supervised machine learning frameworks. The learning task can be challenging for some dataset formulations, for specific propagation scenarios, which yields interesting results, as discussed in the next section of this paper.
V Evaluation Results and Discussion
In this work we are interested in evaluating the applicability of coordinatesbased RA under different propagation scenarios and system constraints, as well as the computational resources for implementing the proposed scheme in a realistic system setup. Based on the scenario description in Section IVD, the performance of the following schemes is evaluated in our work:

CSIbased RA scheme: This represents the traditionally used RA scheme which relies on the instantaneous CSI of the BSterminal pair.

Coordinatesbased RA scheme using KNN: As discussed in Section IIIB, this is the simplest machine learningbased scheme, where we consider .

Geometricbased RA scheme: This is the benchmark scheme, where geometric beamforming is used to determine the transmit beam and receiver filter based on the terminal’s coordinates in relation to the BS placement. The MCS is determined statistically based on the terminal’s position coordinates in relation to the geometry of propagation scenario.
The above performance evaluation is done for different channel characterizations: When no scatterers are present in the propagation environment, or when a number of scatterers up to 5 per 100 m are randomly placed in the propagation environment, and accurate position estimates for the terminal are available to the system. Recall that the former case represents a deterministic channel generation (as mentioned in IVB), while the latter refers to a varying channel generation case. The performance comparison for different data formulations is also done for a fairly deterministic channel generation, where the scatterer density varies up to 1 scatterer per 100 m. To investigate the performance limit of the proposed coordinatesbased RA scheme, we consider different degrees of variation in the error associated with the estimated coordinates of the terminal. This variation is defined by and m. We now we discuss the tuning of RF algorithm for generating the results related to various propagation scenarios considered in our work, followed by the performance results and relevant discussions. At the end, we discuss the implementation of coordinatesbased RA scheme in a realistic system setup and also comment on the computational resources needed for its implementation.
Va Tuning of the Random Forest Algorithm
To optimize the performance of random forest algorithm, we need to tune its parameters. Specifically, we need to decide on the number of trees that make up the forest as well as the maximum depth up to which each tree has to be built while training the model. In terms of the number of randomly selected input features for building each node of the tree, we resort to the conventional practice, i.e. we choose . Fig. 10 shows the training and test accuracy obtained for different number of trees, at varying maximum depth per tree, for the RF algorithm. These tuning results are shown for the maximum scatterers’ density, i.e. 0.05/m, with data formulation . In addition to the traditionally used training and test accuracy metrics, we define a new accuracy metric to determine the performance of the learning algorithm in comparison to the throughput maximization problem. Generally, the test accuracy is computed using onetoone comparison between the ground truth label and the predicted label in the test dataset. However in our work, as mentioned before, each sample of the dataset can have multiple outputs (the RAs) as ground truth labels, therefore, the test accuracy metric alone can not determine how well the learning algorithm has been trained. We call the newly defined metric as performance adjusted accuracy, which is determined by comparing the predicted label to all possible set of labels associated with a test sample, instead of only a single label. Comparing the three accuracy metrics in Fig. 10, we observe that the depth of trees has the most impact on the learning performance. A shallow depth, such as , for the trees is not sufficient to learn the relationship between and , but the depth can be increased up to a certain extent, such as to prevent the model to overfit the training dataset. The test accuracy increases till , while the performance adjusted accuracy remains fairly constant. In addition, the performance does not seem to be affected by the number of trees, as long as it is sufficiently high. Furthermore, a higher number of trees introduces variance in the learnt model, which means that the trained model can learn the various classes even with smaller number of the associated training samples. Note that even for higher number of trees the model requires only a little more training time and memory to be stored by the system. Based on the above observations, we choose the RF parametrization to be .
For better understanding of the RF model, we focus on the confusion matrix obtained for a part of the street section shown in Fig. 7(c). We focus on two parametrizations of RF that provide the best results according to Fig. 10: and . The confusion matrix is a tabulated summary of the performance of classifier; each entry in the row of confusion matrix shows how many samples for the true class are confused with one of the predicted classes. Fig. 11(a) shows the street section with 31 unique classes, or RAs, while Fig. 11(b) and (c) show the confusion matrices for that streetsection obtained for the aforementioned settings of RF algorithm. Comparing the confusion matrices, we see that RF with shows equivalent classification rate for all the classes (marked on the diagonal) compared to . This confirms our previous observation and therefore, the parametrization of is used for evaluating the proposed RA scheme.
Another important factor to consider while training a learning model is to determine the number of training samples required to achieve a reasonable performance. Fig. 12 shows the test accuracy for KNN and RF algorithm for different number of samples in the overall dataset. The results show that the test accuracy saturates for 10,000 samples in the dataset, for both the learning frameworks. With very small number of samples, KNN performs better than RF, but as the number of samples increase, the RF performs consistently well compared to KNN. The advantage of random selection with replacement is not beneficial for RF when very small number of samples are available, but with increased number of samples, RF can achieve up to 10% better accuracy than the simplest learning framework, the KNN. Overall, the test accuracy is the highest for a total of 125,000 samples in the dataset, and that is why, we evaluate the performance of all the schemes for a dataset size of 125,000 samples in the next subsection.
VB Performance Results and Discussion
First, we present the results related to the considered dataset formulation. Fig. 13 shows the performance adjusted accuracy evaluated for test datasets , and , with MIMO system for various scatterers’ densities when accurate position estimates are available to the system. We observe that and show comparable performance adjusted accuracy, irrespective of the number of scatterers present in the propagation environment. Performance adjusted accuracy for is lower due to the fact that the number of samples per class distribution is highly exponential, as shown in Section IVD: Fig. 3, and thus the samples belonging to less frequently occurring classes are misclassified most of the time. For training dataset , the samples per class distribution shows uniform behavior, whereas for , the learning framework learns on all possible for each , and therefore, both dataset formulations are less susceptible to misclassification. Table I shows the average transport capacity for the different RA schemes for different dataset forumulations. Overall, the dataset formulation does not affect the performance of KNNbased and RFbased RA schemes, irrespective of the considered propagation scenario. In terms of performance comparison between different schemes, the proposed coordinatesbased resource allocation scheme achieves a transport capacity very close to the upper bound, the CSIbased RA scheme, which is twice as much as the one achieved by the benchmark geometrybased RA scheme. The geometrybased RA scheme relies on only the position estimate of the terminal to determine the resource allocation, disregarding the presence of scatterers in the propagation environment, and thus suffers from deteriorated performance. In general, both and show similar performance with respect to average transport capacity metric, but due to the ease of analysis of , as discussed in Section IVD, we will use for performance evaluation in the rest of the paper.
Dataset Formulation  RA Scheme  0/m  0.05/m 

, ,  CSIbased  [bps]  [bps] 
RFbased  [bps]  [bps]  
KNNbased  [bps]  [bps]  
Geometrybased  [bps]  [bps]  
RFbased  [bps]  [bps]  
KNNbased  [bps]  [bps]  
Geometrybased  [bps]  [bps]  
RFbased  [bps]  [bps]  
KNNbased  [bps]  [bps]  
Geometrybased  [bps]  [bps] 
Fig. 14 shows the average transport capacity of the system when the density of scatterers varies in the propagation environment, with the assumption that the position estimates of the terminals are accurately known to the system. The results show that both the KNN and RFbased RA schemes are robust to the variation in scatterers’ density compared to the CSIbased RA scheme, where a higher scatterers’ density of 0.05/m leads to a performance difference of about 5% compared to 0/m, when the channel is deterministic. RFbased RA scheme performs better than KNNbased scheme due to the inherent randomness in the random trees that constitute the RF model, and thus can cater for the random channel behavior due to randomized scatterers’ placement, typically for the case of 0.05/m. We conclude from the above discussion that the coordinatesbased resource allocation scheme using machine learning can be applied for determining appropriate resource allocations under favorable propagation scenarios, without relying on CSIcollection. We now discuss the impact on the performance of the proposed scheme when either different antenna configurations are considered or when the erroneous position estimates are available in the system.
Fig. 15 shows the average transport capacity for different antenna configurations when no scatterers are present in the propagation environment, while Fig. 16 shows the same when the scatterer density varies up to 0.05/m. The first observation is that the average system capacity drops by 50% when the number of transmit antennas are reduced by half, due to the wider beam pattern which is a consequence of reduced number of antennas. The transport capacity decreases also when the number of receive antennas is reduced to one, since no receive beamforming can be applied to enhance the received power for a given position of the terminal. The performance of the proposed scheme, however, is not affected by the antenna configuration in general. For the case with no scatterers, as shown in Fig. 15, the coordinatesbased RA scheme performs on par with the CSIbased scheme for all antenna configurations, whereas the scheme performs consistently well when the scatterers’ density varies for any antenna configuration (Fig. 16). These results indicate that the proposed scheme can be used reliably with any antenna configuration for favorable propagation environments.
Another important aspect of investigation relates to the accuracy of the acquired position estimates of the terminals. Fig. 17 shows the average transport capacity of the system when the position estimates are known with varying error margins, for both the deterministic channel case as well as the randomly varying channel. RFbased RA scheme is very robust to the different degrees of error in the acquired position estimates, compared to KNNbased RA scheme, because of the inherent randomness in the trained RF model. A performance loss of about onefifth of transport capacity is observed when the acquired position estimates are highly erroneous, i.e. m, for RFbased RA scheme, but is consistent for an error m. KNNbased RA scheme also performs consistently for position estimates having an error up to m, but results in a performance loss of onethird of the transport capacity compared to the CSIbased scheme when m. In general, the coordinatesbased resource allocation scheme using machine learning performs at par with the legacy CSIbased scheme and is robust to the randomness introduced by the presence of scatterers for favorable propagation environment. The proposed scheme is also not affected by the considered antenna configuration and is quite robust to the erroneous position estimates acquired by the system, unless the position estimates are highly erroneous. All these observations are based on a sizeable amount of data acquired for uncorrelated samples. Next, we discuss the performance of the proposed RA scheme in comparison to other schemes when the dataset is constructed assuming realtime system simulation.
VC Performance Evaluation for Correlated Channels
After observing the feasibility of the coordinatesbased RA scheme through machine learning on the datasets comprising uncorrelated samples, we now evaluate its performance on a realisticsystem implementation. In real time, the data samples collected during the trainingbased mode of the proposed RA scheme (see Fig. 1) are collected on a continual basis, i.e. the collected samples have correlated channels associated with the estimates of the terminal’s position . These samples are then used to train the machine learning model, which is used for predicting the RA for a newly available position estimate to the system during the operation in positionbased mode. For realisticsystem implementation of the proposed coordinatesbased RA scheme, the following key questions arise: (a) How many number of samples are sufficient to train a learning model, (b) how much training time is need to build the RA prediction model, and (c) how much time does the model take to predict a RA for a new ? In this work, we try to provide an intuition to answer these questions by designing a simple set of experiments.
We resort to simulationbased set up, with the key idea of selecting a channel model that captures realistic channel behavior as accurately as possible. As mentioned before, the raytracer based METIS channel model [20] has been validated for different propagation scenarios and is the stateoftheart channel model available to date. Therefore, we use this channel model for the small street section, with the same system parametrization as considered in all the other experiments. Instead of using randomdrop, the terminal moves in a straight line across the street so that the collected samples have correlated channel. The starting position of the terminal is generated randomly, and the subsequent samples are collected by updating only the coordinates of the terminal’s position. We call the movement of the terminal along the street as a trace, and collect several traces for generating the dataset. Each sample is collected after a time period of 1 ms in a single trace, and a collection of these samples is then used to construct the training dataset according to the dataset formulation . For a realistic system implementation, we assume the scatterers’ density to be 0.05/m, where the number of scatterers as well as their placement varies independently over each trace. Overall, 50 traces were generated to have the training dataset size comparable to that of the uncorrelated dataset, for fair evaluation. In terms of the RF parametrization, we use the RF model with for better realtime performance. To evaluate the performance of the trained RF model, we use the test dataset to emulate the realtime data acquisition when the system operates in the positionbased mode. We compute the average system transport capacity for the proposed coordinatesbased RA scheme with KNN and RF models, and compare it to the one obtained for the CSIbased scheme.
Fig. 18 shows the performance results for the three RA schemes when either uncorrelated samples are used for training or when different number of traces in the correlated dataset are used for training the machine learning models. We observe that a small number of traces are sufficient for generating the training dataset to achieve a performance comparable to the CSIbased RA scheme. Specifically, training dataset size of 10,000 samples, collected over a period of about 17 seconds, is capable of achieving a performance very close to the upperbound, the CSIbased RA scheme. Furthermore, this performance is achieved with an RF model that needs only 937 bytes of memory storage for predicting the resource allocation . This is a surprising result: In realtime, the proposed coordinatesbased scheme needs only a couple of seconds to collect the training data, and provides a system performance very close to the CSIbased scheme with only a smallsized learnt model.
Efficient data collection process is vital for implementing the proposed coordinatesbased RA scheme. This means that we also need to determine how frequently the training samples need to be acquired by the system. We apply different rates of undersampling on the correlated dataset with 10 traces, to see how stable is the performance when fewer number of samples are available to train the machine learning frameworks. Fig. 19 shows the average system transport capacity obtained on the uncorrelated test samples when the machine learning frameworks are trained with datasets of decreasing sample size. The results show that both the RFbased and KNNbased RA schemes have stable performance, unless extreme rate of undersampling is applied. An important observation here is that the performance of RF is at par with KNN even when the undersampling rate is as small as 10 ms. To analyze this performance variation between RF and KNN as the number of training samples decreases, we look into the difference of the predicted performance between the two machine learning frameworks. Table II mentions the performance difference between RF and KNN averaged over the number of samples where the chosen machine learning framework outperforms the other. The results show that the margin by which one machine learning framework outperforms the other is consistent across the different datasets, and the margin decreases as the number of samples in the training dataset becomes small.
Samples where:  Uncorrelated Data  Correlated Data (50 traces)  Correlated Data (10 traces)  Correlated Data (10 traces, 10 ms) 

RF is better  
KNN is better 
We also measured the impact of the performance difference between RF and KNN across different datasets by taking the sum of the transport capacity for the samples when one of the two machine learning frameworks performs better and normalize this sum by the total number of samples in the test dataset, i.e. 41,667. Table III shows the resulting values, which indicate that RF outperforms KNN significantly when the number of training samples is sufficiently large (of the order of 10,000), but reduces drastically when a small number of training samples is available. Essentially, KNN performs at par with RF because the latter loses the generalization property due to lack of sufficient data for accurate classification.
Samples where:  Uncorrelated Data  Correlated Data (50 traces)  Correlated Data (10 traces)  Correlated Data (10 traces, 10 ms) 

RF is better  
KNN is better 
Parameters  50 Traces  20 Traces  10 Traces  10 Traces with 10 ms Undersampling 

Training Time  3.74 s  1.61 s  0.899 s  0.188 s 
Persample Prediction Time  83.67s  68.8s  62.34s  54.4s 
All of these results for correlated channel datasets are generated with the assumption that the terminal position is accurately known by the system. But in reality, the estimated position is inaccurate, involving some degree of error. We now assume that the acquired estimates of terminal position are erroneous, with the error modelled as a zeromean Gaussian and a variance determined by 0.4 m. Fig. 20 shows the resulting performance when erroneous positions are used for training the machine learning frameworks using 10 traces in the correlated dataset, with different rates of undersampling. The performance is quite robust, even when a small undersampling rate is applied, compared to the uncorrelated channels’ dataset. The same performance behavior between RF and KNN is observed here as with the perfect positions’ data: KNN performs at par with RF as the number of samples in the training dataset decreases.
In addition to the performance evaluation, we also calculated the training time required by RF, as well as the prediction time per sample, when different number of traces and undersampling rate is considered for generating the realtime dataset. Table IV presents the time required to train the RF model and the prediction time persample it takes, for different sizes of the dataset with correlated channel. It takes less than a second to train the RF model, with , for a dataset size of 10,000 samples that needs less than 1 kB of memory storage. The prediction time persample is also also very small, significantly lesser than the transmission time interval 0.2 ms of the system. All these results point towards the feasibility of implementation of the proposed coordinatesbased resource allocation scheme using machine learning frameworks: The system can switch from the trainingbased mode to the positionbased mode within a minute, and does not need to be retrained frequently since the performance of the proposed scheme is quite stable compared to the CSIbased scheme as shown by the results presented in this work.
Vi Conclusion
This work presented a detailed design of coordinatesbased resource allocation scheme using machine learning frameworks. We used supervised machine learning to learn the relationship between the terminal’s position coordinates and the associated resource allocation to maximize the transport capacity of the system. A performance adjusted accuracy metric was introduced to determine the prediction performance of the learning frameworks with different dataset formulations, based on which the dataset formulation was found to be the best. The average system transport capacity is used for performance comparison between the proposed coordinatesbased resource allocation scheme, a simple geometrybased scheme and a legacy CSIbased resource allocation scheme. The results show that the proposed scheme outperforms the geometrybased scheme by a significant margin, irrespective of the antenna configuration considered in the system. The proposed scheme performs consistently well in comparison to the CSIbased resource allocation scheme and is robust to the different stochastic variations in the system. These results are consistent when realistic system set up is considered, where the samples used for training the machine learning frameworks have correlated channel. Surprisingly, the proposed scheme needs a training dataset with samples collected over a couple of seconds to achieve a transport capacity of 95% compared to the CSIbased resource allocation scheme. In terms of the system resources, the learnt model using random forest algorithm needs less than one second to train and requires less than 1 kB of memory storage for predicting an appropriate resource allocation for a given terminal’s position estimate.
The results from this study are very encouraging to establish the feasibility of coordinatesbased resource allocation for the communication link between an individual base station and a mobile terminal. In future work, we will extend our investigation by applying the proposed coordinatesbased resource allocation scheme to an interferencelimited system, i.e. multiple base stations serving multiple mobile terminals. The interference posed by both the interfering terminals as well as by the neighbouring base stations will bring up new challenges for designing the coordinatesbased resource allocation scheme. It will also be interesting to see how the training time as well as the size of the machine learning model scales with the multipletransmitters, multipleusers system.
Acknowledgment
A major part of the computations in this work was performed on resources provided by the Swedish National Infrastructure for Computing (SNIC) at PDC, KTH.
References
 [1] Y. Arikawa, H. Uzawa, T. Sakamoto, and S. Shigematsu, “HighSpeed RadioResource Scheduling for 5G UltraHighDensity Distributed Antenna Systems,” in 8th International Conference on Wireless Communications Signal Processing, Oct 2016, pp. 1–5.
 [2] J. Wang, C. Jiang, H. Zhang, Y. Ren, K. Chen, and L. Hanzo, “Thirty Years of Machine Learning: The Road to ParetoOptimal Wireless Networks,” IEEE Communications Surveys Tutorials, pp. 1–1, 2020.
 [3] N. E. West and T. O’Shea, “Deep Architectures for Modulation Recognition,” in 2017 IEEE International Symposium on Dynamic Spectrum Access Networks (DySPAN), 2017, pp. 1–6.
 [4] A. Testolin, M. Zanforlin, M. De Filippo De Grazia, D. Munaretto, A. Zanella, M. Zorzi, and M. Zorzi, “A machine learning approach to qoebased video admission control and resource allocation in wireless systems,” in 2014 13th Annual Mediterranean Ad Hoc Networking Workshop (MEDHOCNET), 2014, pp. 31–38.
 [5] Z. Xu, Y. Wang, J. Tang, J. Wang, and M. C. Gursoy, “A Deep Reinforcement Learning Based Framework for PowerEfficient Resource Allocation in Cloud RANs,” in 2017 IEEE International Conference on Communications (ICC), 2017, pp. 1–6.
 [6] R. Di Taranto et al., “LocationAware Communications for 5G Networks: How Location Information can Improve Scalability, Latency, and Robustness of 5G,” IEEE Signal Processing Magazine, vol. 31, no. 6, pp. 102–112, Nov 2014.
 [7] N. Marmasse and C. Schmandt, “Locationaware information delivery withcommotion,” in Handheld and Ubiquitous Computing, P. Thomas and H.W. Gellersen, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2000, pp. 157–171.
 [8] D. Slock, “Location Aided Wireless Communications,” in 5th International Symposium on Communications, Control and Signal Processing, May 2012, pp. 1–6.
 [9] P. Kela et al., “Location Based Beamforming in 5G UltraDense Networks,” in 2016 IEEE 84th Vehicular Technology Conference (VTCFall), Sept 2016, pp. 1–7.
 [10] X. Chen, J. Lu, P. Fan, and K. B. Letaief, “Massive MIMO Beamforming With Transmit Diversity for High Mobility Wireless Communications,” IEEE Access, vol. 5, pp. 23 032–23 045, 2017.
 [11] S. H. Cha, J. S. Kim, and M. Y. Chung, “CoordinatedBeam Selection Scheme Using Mobility Pattern of Mobile Device in 5G Mobile Communication Systems,” in Proceedings of the 11th International Conference on Ubiquitous Information Management and Communication, ser. IMCOM ’17. New York, NY, USA: ACM, 2017, pp. 53:1–53:6.
 [12] S. Sand, R. Tanbourgi, C. Mensing, and R. Raulefs, “Position Aware Adaptive Communication Systems,” in 43rd Asilomar Conference on Signals, Systems and Computers, Nov 2009, pp. 73–77.
 [13] S. Imtiaz, G. P. Koudouridis, and J. Gross, “On the Feasibility of CoordinatesBased Resource Allocation through Machine Learning,” in 2019 IEEE Global Communications Conference (GLOBECOM), Dec 2019, pp. 1–7.
 [14] A. Afifi, K. M. F. Elsayed, and A. Khattab, “InterferenceAware Radio Resource Management Framework for the 3GPP LTE Pplink with QoS Constraints,” in IEEE Symposium on Computers and Communications (ISCC), Jul 2013, pp. 693–698.
 [15] S. Imtiaz, G. P. Koudouridis, H. Ghauch, and J. Gross, “Random Forests for Resource Allocation in 5G Cloud Radio Access Networks Based on Position Information,” EURASIP Journal on Wireless Communications and Networking, vol. 2018, no. 1, pp. 142–157, 2018.
 [16] C. M. Bishop, Pattern recognition and machine learning. springer, 2006.
 [17] R. O. Duda, P. E. Hart, and D. G. Stork, “Pattern Classification and Scene Analysis,” 2nd ed: Wiley Interscience, 1995.
 [18] L. Breiman, “Random Forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001.
 [19] E. Tuomaala and H. Wang, “Effective SINR Approach of Link to System Mapping in OFDM/MultiCarrier Mobile Network,” in 2nd Asia Pacific Conference on Mobile Technology, Applications and Systems, Nov 2005.
 [20] V. Nurmela et al., “Deliverable D1. 4 METIS Channel Models,” in Proc. Mobile Wireless Communication Enablers Inf. Soc.(METIS), 2015.