I Introduction
Future wireless networks will have to satisfy the qualityofservice (QoS) requirements of a large amount of applications such as video and data streaming apart from voice. Around , the new G mobile networks are expected to be deployed. Multimedia applications will be supported by G networks [1][2], and the spectrum utilization will be an important aspect [2][3]. Compared with 4G, the 5G will lead to much greater spectrum allocations and high aggregate capacity for users. Thus, network operators (OPs) will need new spectrum allocation techniques to utilize spectrum more effectively [2][3]. This is mainly due to the fact that the usage of dedicated spectrum by OPs is found to be idle at various times.
In coprimary or horizontal spectrum sharing, the OPs have equal ownership of the spectrum [4]. Moreover, an apriori agreement should be reached on the spectrum usage with respect to long term sharing of each OP. The coprimary spectrum sharing with multipleinput singleoutput (MISO) and multipleinput multipleoutput (MIMO) multiuser in two small cell networks were proposed in [5] and [6], respectively. The authors consider the case when each base station assigns its users to a shared band when the number of subcarriers in the dedicated band is not enough to serve all users. Subcarrier and power allocation methods are proposed in these scenarios [5], [6]. In [7], the orthogonal spectrum sharing between two OPs was shown to be an important aspect in improving the overall throughput. The gains in terms of network efficiency is enhanced by sharing spectrum between two OPs. Link level simulation and hardware demonstrations are given. In [8], a potential game with a learning algorithm is shown to reach a system equilibrium which enhances spectrum efficiency between OPs. A distributed method is given to reduce the complexity for interOP spectrum sharing.
In our work, we propose a multiOP spectrum sharing in small cell network. Each OP is assumed to serve multiple small base stations (SBSs) in an indoor scenario. The SBSs are considered to be spatially distributed according to the Poisson point process (PPP) inside a building. We also assume that the OPs connect to a central controller which is responsible for assigning resource blocks (RBs) from a common pool of spectrum to the OPs. We study the spectrum assignment problem in which the central controller can allocate multiple RBs to an OP and multiple OPs can utilize a single RB. Each OP is allowed to have a certain maximum number of RBs. Since the dedicated spectrum of each OP is assumed to be fixed, our study only focuses on allocating the shared spectrum to the OPs.
Our objective is to maximize the social welfare given in terms of weighted sum rate of the OPs. The solution is considered in two parts. In the first part, RBs are assigned from the common pool of spectrum to the OPs. Once the OPs obtain their RBs, each SBS associated with an OP tries, in a distributive manner, to maximize their expected rate via power allocation so that a certain QoS is satisfied for each user equipment (UE). A manytoone matching game with externalities is used to solve the first part of the problem. We extend the manytoone matching framework so that each OP can be allocated more than one RB. Two methods are introduced to solve the problem in the first part, namely greedy swap and Monte Carlo Markov Chain (MCMC) algorithms. The reinforcement learning method is used to solve problem of the second part.
We also analyze the expected data rate of an SBS based on the spatial PPP distribution of SBSs and exploit it to prove the local optimality of pairwise stable matchings. However, due to space constraint, the details of the derivations have been omitted in this paper, but will be provided in a journal version.
The rest of the paper is organized as follows: Section II describes the system model and stochastic geometrical analysis of the expected rate of the SBSs. Section III presents interOP spectrum sharing by using the concept of matching theory. Section IV describes intraOP spectrum sharing with reinforcement learning for power allocation. The performance evaluation results are presented in Section V. The conclusions are given in Section VI.
Ii System Model
We propose a multiOP spectrum sharing for small cells network deployment. The macro base stations (MBs) are assumed to transmit in channels orthogonal to the SBSs; thus, the interference from MBs to SBSs is absent. Each OP serves multiple SBSs, and each SBS serves a single user equipment (UE). The spectrum of OPs serving the SBSs is assumed to be divided into dedicated bands and a shared common band. The shared common band can be accessed by multiple OPs and can be allocated to their respective SBSs. The dedicated spectrum of each OP is assumed to be fixed and predetermined. Our study focuses only on allocating the shared spectrum to the OPs. All the SBSs employ the orthogonal frequency division multiple access (OFDMA) scheme for their channel access.
A set of multiple OPs is given by with OPs. Let the set of SBSs subscribed to an OP be given by the set with SBSs inside the building. We assume that each OP has the same intensity/density of SBSs per unit area. Also, let be the set of all the SBSs. Each SBS is assumed to serve a single UE. For each SBS in , we will denote its associated user equipment by UE.
The set of RBs in the common shared band available to the network is given by with RBs. Let be the set of RBs assigned to OP with RBs. The SBSs associated with OP are free to select any one of the RBs in to serve its UE. We assume that the SBS will randomly choose a single RB from . Hence, its transmit power allocation is restricted to a single RB. Let the total power of each SBS be given by , which is discretized into levels, where is a quanta of power. Thus, the set of transmit power levels that an SBS can choose from is . We shall denote the transmit power of the SBS by . The SBS is assumed to use a probabilistic scheme to select power level . Thus, any given action taken by each SBS can be represented by .
We assume that the RB allocated to an OP can be accessed by more than one SBS associated with that OP. Thus, the expected rate of the UE associated with SBS is given by
(1) 
where is the transmit power of SBS on RB, is the channel gain between UE and SBS using RB. The is the set of SBSs using the same RB while
is the noise variance. Here the expectation is taken with respect to the channel gain, distance geometry, as well as probabilistic channel access and power allocation strategy. We assume that the fading is Rayleigh. The interference experienced by a UE of an SBS can be considered as either intraOP interference or interOP interference. The intraOP interference is caused by the fact that the SBSs associated with a given OP can access any RB assigned to that OP. Thus, two SBSs served by one OP can access the same RB. On the other hand, the interOP interference is caused by the fact that a given RB can be shared by two or more OPs.
The expected system rate of OP will be the sum of expected rates of each SBS. We can express the rate of OP as,
(2) 
where is the weight at each SBS.
Iia Analysis of Expected Rate
In the following section, we will first present an expression for the expected rate of a generic SBS based on the spatial distribution of the SBSs, after which we will use it for the game formulations. Let the rate of a generic downlink SBSUE system transmitting in RB and at power level be given by
(3) 
Here, explicitly incorporating the distance attenuation in the SINR formula one gets,
(4) 
where denotes pathloss exponent and is the distance between the UE and SBS. We take the expectation with respect to the channel gains and interference nodes,
where is the aggregate interference experienced by UE in RB
. Using the fact that for positive random variables
, the expectation becomesUsing Laplace transform [9] and [10], after some derivations, finally, we obtain the expected rate as,
(5) 
where . is the intensity SBSs.
The above analysis holds for any SBS located at any location. This is guaranteed by the Slivnyak’s theorem [9], according to which the statistics for the PPP is independent of the test location. This also implies that the SBSs transmitting over an RB are identical. That is, if every SBS allocates its power
according to an identical randomizing principle, then the probability mass function (PMF) of
should be identical to the PMF of . Assuming that the PMF of and are stationary, then is a time independent constant. Thus, the value of depends only on the value of transmit power level chosen by the SBS. Also, since SBS can access any one of RBs assigned to its associated OP with equal probability of , the expected rate of SBS is given by(6) 
This average rate depends only on , the intensity of interfering SBSs.
Iii MultiOperator Spectrum Sharing using Matching Game
Consider the social welfare of the network as the overall weighted sum rate as follows:
(7) 
where is a matching matrix . We denote the matrix X as,
(8) 
where is a matching. is the weight at the OP.
The objective of the matching game for the multiOP spectrum sharing is to maximize the social welfare. Thus, the optimization problem can be expressed as,
s.t.  (9)  
Condition (C1) assures that each RB can be allocated to at most OPs, and condition (C2) guarantees that each OP gets at most RB.
Iiia ManytoOne Matching with Externalities
We define the matching game over two sets of players () with the preference relation which allows each player to build preferences over the set of RBs . In our case, we assume that the set of RBs gives equal preference to OPs. That is, in the allocation of an RB, there is no preference for a specific OP. However, we follow the framework described in [11] that directly deals with utilities rather than preferences.
With the manytoone matching framework, at most one RB will be allocated to an OP. However, our problem allows us to allocate more than one RB to an OP, as given by constraint (C2) in (9). To tackle this problem, we create an augmented set of players by producing identical copies of OPs. Each copy of OP inherits all the SBSs associated with its parent OP . Let denote the set of identical copies of OP, which we shall refer to as the set of children OP. Thus, our augmented set of OPs is . Since each child OP is assigned with at most one RB in manytoone matching, if the number of children OPs is equal to the maximum number of required RB, then this method guarantees that each parent OP can obtain more than one RB. At the same time, by allocating at most one RB to each child OP, it ensures that each parent OP will get the maximum number of allowed RBs
However, it requires that the group of players , which are the set of children of OP, coordinate with each other such that no two players in select the same RB. Otherwise, each parent OP will be assigned with a lower number of RBs than the requirement. As illustrated in Fig. 1, OP requires two RBs, so it makes two copies of itself; whereas OP requires three RBs, so it makes three copies of itself.
For a given parent OP, we will take the rate of SBS and children OP to be given by (6) and (2) respectively. We will take the rate of parent OP as
(10) 
We extend the idea of swap matching as given in [11], which considers peer effects of a social network and a weaker notion of stability, known as twosided exchange stability. We propose a decentralized approach that can guarantee the number of RBs required for each OP while at the same time ensuring that each RB is not utilized by more than the desired number of OPs.
Definition : For manytoone matching, a matching is a subset such that and where and .
Also, for any , let denote the cosharers of an RB which are children of the same parent OP as . We will denote the desirability of RB for any OP by . In our case, the desirability of an RB for children OP is given by the weighted sum rate obtained by the OP when it accesses that RB as given in (2). For a given matching , we can write the desirability as . The utility of OP is given by,
(11) 
where the indicator function is given by
In other words, if two children of the same parent OP access the same RB, they will be punished. This has the effect of ensuring that two sibling OPs will access different RBs.
A swap matching is a matching in which the OPs and switch places while keeping all assignments of other OPs the same.
Two possible algorithms are given as Algorithm 1 and Algorithm 2. The Algorithm 1 proceeds in a greedy fashion to improve the social welfare and can be implemented distributively. Since the social welfare strictly improves with each iteration, this algorithm converges to a twosided exchangestable matching. Algorithm 2 proceeds to optimize the social welfare via the Markov Chain Monte Carlo (MCMC) method.
IiiB Stability of Manytoone Matchings with Externalities
In this part, we show the existence of the manytoone stable matching with externalities for multiOP spectrum sharing. We prove that all local maxima of the social welfare are pairwise stable. We first define what we mean by local maxima and then give a lemma, after which we will prove the said theorem. First, let the potential of the system be defined as,
(12) 
Definition : The local maximum of the potential is matching for which there exists no matching which is obtained from by swapping any two OPs such that .
We now show that the desirability of RB for the rest of the OPs that use this RB, and which are not involved in a swap process, does not change after the swap has occurred.
Lemma : For any swap matching , for .
Lemma : Any swap matching such that,

and

,
leads to .
Since the number of matchings is finite, there exists at least one optimal matching which leads to the maximum social welfare. The Theorem ensures that this matching is pairwisestable.
Theorem : All local maxima of are pairwise stable.
Corollary : If for all , then all local maxima of system objective are pairwise stable.
Iv IntraOperator Spectrum Sharing using Reinforcement Learning Strategy
In this section, we propose a mechanism of selforganizing networks based on reinforcement learning. We assume that all the SBSs are able to estimate the interference they experience at each RB and accordingly tune their transmission strategies towards a better performance based on Qlearning.
Iva learning
The learning model consists of a set of states and actions aiming at finding a policy that maximizes the observed rewards over the interaction time of the agents/players (i.e., small cells). Every SBS subscribed to an OP, where explores its environment, observes its current state , and takes a subsequent action , according to a decision policy .
For each OP belonging to the set , let us denote by the learning game. Here, the players of the game are the SBSs which seek to allocate power in the RBs assigned to their corresponding OP. The is the state of SBS at time
. The state of an SBS is a binary variable,
, which indicates whether SBS experiences interference in RB assigned to its corresponding OP such that its required QoS is violated. The QoS requirement is said to be violated when , where is given by (4). The is the action of SBS, where . Any given action can be represented by an integer variable , where represents the power level. Finally, is the utility function or payoff of SBS at timeinstant , which we take as the instantaneous rate of SBS at timeinstant as given by (3) if the QoS is satisfied, otherwise it is taken to be zero:(13) 
The expected discounted reward over an infinite horizon can be given by:
(14) 
where is a discount factor and is the agent’s reward at time . is the mean value of reward , and is the transition probability from state to . For a given policy , we can define a value as:
(15) 
which is the expected discounted reward when executing action at state and then following policy thereafter. The actions are chosen according to their values as:
(16) 
The learning process aims at finding in a recursive manner where the update equation is given in [12].
V Numerical Results
In this section, we present numerical results to evaluate the performance of our multiOP spectrum sharing framework and proposed algorithms. The SBSs are spatially distributed in a PPP within a square area with sides of meters, and each OP has the same density of SBSs per unit area. For
OPs, let the maximum number of RBs required by each OP be given by the vector
. The vector c tells us how many children that each parent OP will have in the augmented OP set. For simplicity, we assume the weights in the social utility function and at each SBS to be .Each SBS has one UE associated with it. The UE is located within meters of the SBS. The pathloss between SBS and SBSUE at distance meters is given by dB, and the pathloss due to the wall is
dB. The standard deviation of lognormal shadow fading is assumed to be
dB. The maximum transmit power of each SBS is dBm, and the noise variance is dBm. The SINR threshold at each user is dB. Each plot is based on random samples.In Fig. 2
, we plot the cumulative distribution function (CDF) of the overall social welfare (bits/sec/Hz) for different number of OPs and different power allocation schemes. We fix the number of available RBs to
and the number of OPs to utilize the same RBs, for all . The set of augmented OPs for the four different cases considered are for , for , for , and for . We consider cases when each SBS allocates power to its UE using uniform power allocation, Qlearning and full power allocation. Although with full power allocation from SBS to its UE will cause higher interference, the achievable date rate at each UE will be calculated only if the QoS is statisfied. Thus, with higher transmitted power from the SBSs, the data rate will be increased. Even by using Qlearning, the CDF is lower compared with the full power allocation, but the Qlearning can save more power at the SBS. Furthermore, we observe that the CDF improves with the increase of the number of OPs. This is due to the fact that more OPs utilize the available RBs of the common pool spectrum, and hence the overall social utility tends to increase.In Fig. 3, the convergence of the MCMC and greedy swap algorithms is demonstrated with Qlearning and full power allocation. We fix for , , and . With full power allocation, both MCMC and greedy swap algorithms achieve higher social welfare (bits/sec/Hz) than using Qlearning. The greedy swap algorithm converges faster than MCMC algorithm. On the other hand, the MCMC algorithm provides higher social welfare with both Qlearning and full power allocation cases compared with greedy swap algorithm.
In Fig. 4, we fix the maximum number of RBs for each OP to be for , for , for , for and . We plot the average social welfare per each OP (bits/sec/Hz/OP) as the number of OPs are varied. We observe that as the number of OPs increases, the average social welfare per OP decreases. This is because increasing the number of OPs will cause more interference in the system. However, the overall social welfare is still increased, as given by the CDF of that in Fig. 2. Moreover, when we increase the number of RBs, we can see that the average social welfare per OP increases. This is not surprising since the number of available RBs to be chosen from has increased.
In Fig. 5, the CDF of the overall social welfare is shown when , , and when and for various values of c where . We can observe that when the size of c increases, the CDF of overall social welfare decreases for both and . For example, when and for , the CDF of overall social welfare is much better when . This is because increasing the value of has the effect of increasing the size of the augmented set of OPs. Thus, children OP of a parent OP can use the same RB used by the children OP of other parent OPs, which tends can increase the interference. Hence, it will affect to the parent OPs and decrease the overall social welfare.
Vi Conclusion
In this paper, we have considered multiOP spectrum sharing in an indoor deployment scenario. We have studied a scenario where multiple OPs share some parts of their spectrum among each other. We have cast this spectrum sharing as a social welfare maximization problem. The main problem has been decomposed into two parts. The first part is to assign resource blocks to multiple OPs while in the second part each SBS associated with an OP would try to maximize their expected rate via a distributive power allocation method. The manytoone matching game with externalities has been extended to twosided matching to deal with the first part of problem. We have created an augmented set of players by producing identical copies of OPs. Since each augmented OP would be assigned to at most one RB, the number of augmented OPs is set to be equal to the maximum number of required RBs. This method thus guarantees that each main OP can obtain more than one resource block. In the second part of the problem, Qlearning has been proposed as the power allocation method for each SBS of an OP. Matching and Qlearning is iteratively performed until convergence.
References
 [1] E. Hossain, M. Hasan, “5G cellular: key enabling technologies and research challenges ,” in IEEE Instrumentation Measurement Magazine, vol.18, pp.1121, Jun. 2015
 [2] Nokia, “Looking Ahead to 5G, Building a virtual zero latency gigabit experience,” in Nokia White Paper, 2014
 [3] Ericsson, “5G Radio Access,” in The communication technology journal since 1924, Jun. 2014
 [4] T. Irnich, J. Kronander, Y. Selen, G. Li, “Spectrum Sharing Scenarios and Resulting Technical Requirement for 5G Systems,” in IEEE International Symposium on Personal Indoor and Mobile Radio Communications (PIMRC Workshops), pp. 127132, Sep. 2013.
 [5] T. Sanguanpuak, N. Rajatheva, M.LatvaAho, “CoPrimary Spectrum Sharing with Resource Allocation in Small Cell Network,” in Proceedings of 1st International Conference on 5G for Ubiquitous Connectivity (5GU), pp. 610, Nov. 2014.
 [6] T. Sanguanpuak, S. Guruacharya, N. Rajatheva, M.LatvaAho, “Resource Allocation for CoPrimary Spectrum Sharing in MIMO Networks,” in Proceedings of IEEE International Conference on Communications Workshop (ICC Workshop), Jun. 2015.
 [7] E. A. Jorswieck, L. Badia, T. Fahidieck, E. Karipidis, J. Luo “Spectrum Sharing Improves the Network Efficiency for Cellular Operators,” in IEEE Communications Magazine, pp. 129136, Mar. 2014.
 [8] L. YuTing, H. Tembine, C. KwangCheng,“Interoperator spectrum sharing in future cellular systems,” in IEEE Global Communications Conference (GLOBECOM), pp.25972602, Dec. 2012.
 [9] F. Baccelli, B. Blaszczyszyn, Stochastic Geometry and Wireless Networks, Volume I  Theory, NoW Publishers, 2009.
 [10] P. Semasinghe, E. Hossain, and K. Zhu, “An Evolutionary Game for Distributed Resource Allocation in SelfOrganizing Small Cells,” IEEE Transactions on Mobile Computing, vol. 14, no. 2, pp. 274287, Feb. 2015.

[11]
E.B. Baron, C. Lee, A. Chong, B. Hassibi, and A. Wierman, “Peer Effects and Stability in Matching Markets,” in
Proceedings of International Conference on Algorithm game theory
, pp. 117129, Jul. 2011.  [12] M. Bennis, S. Guruacharya, D. Niyato, “Distributed Learning Strategies for Interference Mitigation in Femtocell Networks,” in Proceedings of IEEE Global Telecommunications Conference (GLOBECOM), pp.15, Dec. 2011.
Comments
There are no comments yet.