I Introduction
ML and SG have recently found many applications in the design and analysis of wireless networks. However, since the nature of the problems studied with these tools are so fundamentally different, it is rare to find a common ground where the strength of these tools can be jointly leveraged. While the foundation of wireless networks is built on traditional probabilistic models (such as channel, noise, interference, queuing models), ML is changing this modeldriven approach to more datadriven simulationbased approach by learning the models from extensive datasets available from the real networks or field trials [1]. On the other hand, the basic premise of SG is to enhance the modeldriven approach by endowing distributions on the locations of the transmitters (Txs) and receivers (Rxs) so that one can derive the exact and tractable expressions for key performance metrics such as interference, coverage, and rate. In this paper, we concretely demonstrate that these two mathematical tools can be jointly applied to a class of problems known on the subset selection problems, which have numerous applications in wireless networks.
Subset selection problems. In wireless networks, a wide class of resource management problems like power/rate control, link scheduling, network utility maximization, and beamformer design fall into the category of subset selection problems where a subset from a ground set needs to be chosen which optimizes a given objective function. For most of the cases, finding the optimal subset is NPhard. The common practice in the literature is to design some heuristic algorithms which find a local optimum under reasonable complexity. Even most of these heuristic approaches are NPcomplete and are hence difficult to implement when the network size grows large.
In ML, subset selection problems appear in a slightly different context where the primary objective is to preserve the balance between quality and diversity of the items in the subset, i.e., to select good quality items from a ground set which are also nonoverlapping in terms of their features. For example, assume that a user is searching the images of New York in a webbrowser. The image search engine will pick a subset of stock images related to New York from the image library which contains the popular landmarks (quality) as well as ensure that one particular landmark does not occur repeatedly the search result (diversity
). Few more examples of subset selection with diversity are text summarization
[2], citation management [3], and sensor placement [4]. The attempt to model diversity among the items in a subset selection problem brings us to the probabilistic models constructed by DPPs, which lie at the intersection of ML and SG. Initially formulated as a repulsive point process in SG [5], DPPs are natural choice for inducing diversity or negative correlation between the items in a subset. Although the traditional theoritical development of DPPs has been focused on the continuous spaces, the finite version of the DPPs have recently emerged as useful probabilistic models for the subset selection problems with qualitydiversity tradeoff in ML. This is due to the fact that the finite DPPs are amenable to the datadriven learning and inference framework of ML [3].Relevant prior art on DPPs. In wireless networks, DPPs have mostly been used in the SGbased modeling and analysis of cellular networks. In these models, DPPs are used to capture spatial repulsion in the BS locations, which cannot be modeled using more popular Poisson point process (PPP) [5]. For some specific DPPs, for instance the Ginibre point process, it is possible to analytically characterize the performance metrics of the network such as the coverage probability [6]. However, the finite DPPs and the associated datadriven learning framework, which is under rapid development in the ML community has not found any notable application in wireless networks. The only existing work is [7], where the authors have introduced a new class of datadriven SG models using DPP and have trained them to mimic the properties of some hardcore point processes used for wireless network modeling (such as the Matérn typeII process) in a finite window.
Contributions. The key technical contribution of this paper is the novel DPPL framework for solving general subset selection problems in wireless networks. In order to concretely demonstrate the proposed DPPL framework, we apply it to solve the link scheduling problem which is a classical subset selection problem in wireless networks. The objective is to assign optimal binary power levels to TxRx pairs so as to maximize the sumrate [8]. The links transmitting at a higher (lower) power level will be termed active (inactive) links. Therefore, the objective is to determine the optimal subset of simultaneously active links. Similar to the subset selection problems in ML, the simultaneously active links will be selected by balancing between the quality and diversity. The links which will be naturally favored are the ones with better link quality in terms of signaltointerferenceandnoiseratio () so that the rates on these links contribute more to the sumrate (quality). On the other hand, the simultaneously active links will have some degree of spatial repulsion to avoid mutual interference (diversity
). With this insight, it is reasonable to treat the set of active links in the optimal solution as a DPP over the set of links in a given network. The DPP is trained by a sequence of networks and their optimal subsets which are generated by using an optimization algorithm based on geometric programming (GP). We observe that the sumrates of the estimated optimal subsets generated by the trained DPP closely approach the optimal sumrates. Moreover, we show that the subset selection using DPP is significantly more computationally efficient than the optimization based subset selection methods.
Ii Determinantal point process: Preliminaries
In this Section, we provide a concise introduction to DPP on finite sets. The interested readers may refer to [3] for a more pedagogical treatment of the topic as well as extensive surveys of the prior art. In general, DPPs are probabilistic models that quantify the likelihood of selecting a subset of items as the determinant of a kernel matrix (). More formally, if is a discrete set of items, a DPP is a probability measure on the power set which is defined as:

(1) 
where is a random subset of and denotes the restriction on to the indices of the elements of (). We denote as the marginal kernel which is a positive semidefinite matrix such that ( is an identity matrix), i.e. all eigenvalues of are less than or equal to 1. For the learning purposes, it is more useful to define DPP with another formalism known as the ensemble. A DPP can be alternatively defined in terms of a matrix () indexed by :

(2) 
where . The last step follows from the identity (see [3, Theorem 2.1] for proof). Following [3, Theorem 2.2], and are related by the following equation:
(3) 
Since, is real and symmetric by definition, its eigendecomposition is , where
is the orthonormal sequence of eigenvectors corresponding to the eigenvalues
. Using (3), can also be obtained by rescaling the eigenvalues of as:

(4) 
In the ML formalism, if
is some vector representation of the
item of , then can be interpreted as a kernel matrix, i.e., where is a kernel function and is the corresponding feature map. The kernel can be further decomposed according to the qualitydiversity decomposition [3] as:(5) 
where denotes the quality of () and denotes the similarity of and (). Using (5), we can write (2) after some manipulation as: where the first term denotes the diversity and second term denotes the quality of the items in . We now provide a geometric interpretation of as follows.
Remark 1.
We can intuitively interpret as the squared volume of the parallelepiped spanned by the vectors , where and . Thus, items with higher are more probable since the corresponding s span larger volumes. Also diverse items are more probable than the similar items since more orthogonal collection of s span larger volume (see Fig. 1 for an illustration). Thus DPP naturally balances the quality and diversity of items in a subset.
Iii The Proposed DPPL framework
Iiia Conditional DPPs
Most of the learning applications are inputdriven. For instance, recalling the image search example, a user input will be required to show the search results. To model these inputdriven problems, we require conditional DPPs. In this framework, let be an external input. Let be the collection of all possible candidate subsets given . The conditional DPP assigns probability to every possible subset as: , where is a positive semidefinite kernel matrix. Following (2), the normalization constant is . Now, similar to the decomposition technique in (5), where denotes the quality measure of link and denotes the diversity measure of the links and () given . In [3], the authors proposed a loglinear model for the quality measure as follows:
(6) 
where assigns feature values to . We will discuss the specifics of in the next Section. For , we choose the Gaussian kernel:
IiiB Learning DPP model
We now formulate the learning framework of the conditional DPP as follows. We denote the training set
as a sequence of ordered pairs
, where is the input and is the output. Then the learning problem is the maximization of the loglikelihood of :(7) 
where

(8) 
where parameterized by and . The reason for choosing the loglinear model for quality measure and Gaussian kernel is the fact that under these models, becomes a concave function of and [3, Proposition 4.2].
IiiC Inference
We now estimate given using the trained conditional DPP. This phase is known as the testing or inference phase. In what follows, we present two methods for choosing .
IiiC1 Sampling from DPP
The first option is to draw random sample from the DPP, i.e., and set . We now discuss the sampling scheme for a general DPP which naturally extends to sampling from conditional DPP. We start with drawing a random sample from a special class of DPP, known as the elementary DPP and will use this method to draw samples from a general DPP.
A DPP on is called elementary if every eigenvalue of its marginal kernel lies in . Thus an elementary DPP can be denoted as where is the set of orthonormal vectors such that . We now establish that the samples drawn according to always have fixed size.
Lemma 1.
If , then almost surely.
Proof:
If , since . Hence . Now, ∎
Our objective is to find a method to draw a length sample . Using Lemma 1, . In what follows, we present an iterated sampling scheme that samples elements of from without replacement such that the joint probability of obtaining is . Without loss of generality, we assume . Let be the matrix whose rows contain the eigenvectors of . Then, and , where is the volume of the parallelepiped spanned by the column vectors (s) of . Now, , where denotes the projection of onto the subspace orthogonal to . Proceeding in the same way,
(9) 
Thus, the step () of the sampling scheme assuming is to select with probability proportional to and project to the subspace orthogonal to . By (9), it can be guaranteed that .
Having derived the sampling scheme for an elementary DPP, we are in a position to draw samples from a DPP. The sampling scheme is enabled by the fact that a DPP can be expressed as a mixture of elementary DPPs. The result is formally stated in the following Lemma.
Lemma 2.
A DPP with kernel is a mixture of elementary DPPs:
(10) 
where .
Proof:
Please refer to [3, Lemma 2.6]. ∎
Thus, given an eigendecomposition of , the DPP sampling algorithm can be separated into two main steps: (i) sample an elementary DPP with probability proportional to , and (ii) sample a sequence of length from the elementary DPP . The steps discussed thus far are summarized in Alg. 1.
IiiC2 MAP inference
A more formal technique is to obtain the maximum a posteriori (MAP) set, i.e., . But, finding is an NPhard problem because of the exponential order search space . However, one can construct computationally efficient MAP inference algorithm which has similar complexity as random sampling. Due to space limitations, more formal discussions of these approximation techniques are outside the scope of the paper. We refer to [9] for one possible nearoptimal MAP inference scheme for DPPs which will be used in the numerical simulations.
Iv Case Study: Link Scheduling
We will now introduce the link scheduling problem where we will apply the DPPL discussed in the previous Section.
Iva System Model
We consider a wireless network with TxRx pairs with fixed link distance . The network can be represented as a directed bipartite graph , where and are the independent sets of vertices denoting the set of Txs and Rxs, respectively and is the set of directed edges where and . Since each Tx has its dedicated Rx, the indegree and outdegree of each node in and are one. Also . An illustration of the network topology is presented in Fig 2. Let be the complete weighted bipartite graph on with for all . Here denotes the channel gain between Tx and Rx .
IvB Problem Formulation
We assume that each link can be either in active or inactive state. A link is active when the Tx transmits at a power level and is inactive when the Tx transmits at a power level (with ). Each link transmits over the same frequency band whose bandwidth is assumed to be unity. Then the sumrate on the link is given by , where is the at the Rx: . Here is thermal noise power. The sumrate maximization problem can be expressed as follows.
(11a)  
(11b) 
where the variables are . An optimal subset of simultaneously active links denoted as is the solution of (11b). Thus, and .
IvC Optimal Solution
The optimization problem in (11) is NP hard [8]. However, for bipartitle networks the problem can be solved by a lowcomplexity heuristic algorithm based on GP (see Alg. 2). For completeness, we have provided the rationale behind its formulation in Appendix A. For further details on solving the general class of link scheduling problems, the reader is referred to [8]. Fig. 2 demonstrates a realization of the network and chosen by Alg. 2.
minimize  (12a)  
subject to  (12b)  
(12c)  
(12d) 
IvD Estimation of optimal subset with DPPL
We will now model the problem of optimal subset selection with DPPL. We train the DPP with a sequence of networks and the optimal subsets obtained by Alg. 2. For the training phase, we set as the realization of the network and its optimal subset. The quality and diversity measures are set as: where with and with are the two strongest interfering powers, and where and denote the locations of Tx and Rx , respectively. The ground set of the DPP . We denote the subset estimated by DPPL in the testing phase as . The block diagram of the DPPL is illustrated in Fig. 3. In order to ensure the reproducibility of the results, we provide the Matlab implementation of the DPPL for this case study in [10].
IvE Results and Discussions
We now demonstrate the performance of DPPL through numerical simulations. We construct the network by distributing links with m within a disc of radius m uniformly at random. We assume channel gain is dominated by the power law path loss, i.e., , where , , and is the pathloss exponent. The network during training and testing phases was generated by setting with . The instances where were discarded. We set dB, dB, and dB. The training set was constructed by independent realizations of the network. Note that changing from to did not change the values of and () significantly. In Fig. 4
, we plot the empirical cumulative distribution functions (CDFs) of the sumrates obtained by Alg.
2 and DPPL. We observe that the sumrate obtained by DPPL framework closely approximates the sumrate. We also notice that DPP MAP inference gives better sumrate estimates than DPP sampling. We further compare the performance with the wellknown SGbased model where the simultaneously active links are modeled as independent thinning of the actual network [7]. In particular, each link is assignedaccording to an independent and identically distributed (i.i.d.) Bernoulli random variable with probability
. We estimate by averaging the activation of a randomly selected link which is equivalent to: for a fixed . We see that the sumrate under independent thinning is significantly lower than the one predicted by DPP. The reason is the fact that the independent thinning scheme is not rich enough to capture spatial repulsion which exists across the links of .IvE1 Runtime Comparison
Another key strength of the proposed DPPL appears when we compare its runtime in the testing phase and Alg. 2 applied on a network . In Fig. 5, we plot the runtimes of different subset selection schemes for different network sizes. The absolute values of runtimes were obtained averaging the runtimes of all the schemes over iterations in the same computation environment. In order to obtain a unitfree measure, we normalize these absolute values by dividing them with the average absolute runtime of Alg. 2 for . We observe that DPPL is at least times faster than Alg. 2. The runtime of Alg. 2 increases exponentially with whereas runtimes of the DPPL scale as some polynomial order of .
Note that DPPL is not just a sumrate estimator of the network, but it estimates the optimal subset of links significantly faster than the optimization algorithms. Thus, DPPL can be implemented in real networks to determine even when the network size is large. In Fig. 6, we plot the sumrates averaged over network realizations for a given value of . Note that evaluating sumrates for higher values of using Alg. 2 is nearly impossible due to its exponentially increasing runtime. Quite interestingly, DPPL, thanks to its fast computation, provides some crisp insights on the network behavior: as more number of links are added, the estimated sumrate tends to saturate (see Fig. 6). This is expected because as long as the resources are fixed, there will be a limit on the number of simultaneously active links (irrespective of ) that would maximize the sumrate. If the number of active links is more than this limit, sumrate may decrease because of the increased interference. Also we observe that the performance difference between MAPinference and DPPsampling increases significantly at higher values of .
V Conclusion
In this paper, we identified a general class of subset selection problems in wireless networks which can be solved by jointly leveraging ML and SG, two fundamentally different mathematical tools used in communications and networking. To solve these problems, we developed the DPPL framework, where the DPP orginiates from SG and its learning applications have been finetuned by the ML community. When applied to a special case of wireless link scheduling, we found that the DPP is able to learn the underlying qualitydiversity tradeoff of the optimal subsets of simultaneously active links. This work has numerous extensions. From the SG perspective, it is of interest to compute analytical expressions of the key performance metrics of the network such as the mean interference at a typical receiver or the average rate by leveraging the analytical tractability of DPPs. From the ML perspective, the DPPL can be extended to include time as another dimension and solve the subset selection problems over time (e.g. the scheduling problems in cellular networks, such as the proportional fair scheduling) using the spacetime version of the DPPL (also known as the dynamic DPP [11]). From the application side, this framework can be used to solve other subset selection problems such as the user group selection in a downlink multiuser multipleinputmultipleoutput (MIMO) setting.
a Formulation of Alg. 2
Since (11) is an integer programming problem, the first step is to solve the relaxed version of the problem assuming continuous power allocations. In particular, we modify the integer constraint (11b) as . Since is an increasing function, the problem can be restated as:

(13a)  
s.t.  (13b)  
(13c) 
Since the objective function is decreasing in , we can replace the equality in (13b) with inequality. Using the auxiliary variables , (13) can be formulated as:
(14a)  
(14b)  
(14c)  
(14d) 
Now in (14), we observe that (14a) is a monomial function, (14b) contains posynomial function in the right hand side (RHS), and all the constraints contain either monomial or posynomial functions. Hence, (14) is a complementary GP [12]. If the posynomial in (14b) can be replaced by a monomial, (14) will be a standard GP. Since GPs can be reformulated as convex optimization problems, they can be solved efficiently irrespective of the scale of the problem. One way of approximating (14) with a GP at a given point is to replace the posynomial by a monomial . From , we get

(15) 
Also note that , for and . Thus the local approximation of (14) will still satisfy the original constraint (14b). The modified inequality constraint becomes

(16) 
where and are obtained by (15).
Since (14a) is a decreasing function of , we can substitute with its maximum value , which satisfies the other inequality constraints. Thus, can be eliminated as:

(17) 
where is some constant which does not affect the minimization problem. Thus, the iteration of the heuristic runs as follows. Let be the current guess of values. The GP will provide a better solution around the current guess which is set as the initial guess in the next iteration, i.e., unless a termination criterion is satisfied. These steps are summarized in Alg. 2. To ensure that the GP does not drift away from the initial guess , a new constraint (12b) is added so that remains in the local neighborhood of . Here is the control parameter. Smaller the value of , higher is the accuracy of the monomial approximation, but slower is the convergence speed. For a reasonable tradeoff between accuracy and speed, is set to . The algorithm terminates with the quantization step which assigns discrete power levels and . Once we obtain the optimal power allocation , we quantize it into two quantization levels and by setting whenever its value lies below some threshold level or otherwise .
References
 [1] O. Simeone, “A very brief introduction to machine learning with applications to communication systems,” IEEE Trans. on Cognitive Commun. and Networking, vol. 4, no. 4, pp. 648–664, Dec. 2018.
 [2] A. Nenkova, L. Vanderwende, and K. McKeown, “A compositional context sensitive multidocument summarizer: exploring the factors that influence summarization,” in Proc. SIGIR. ACM, 2006, pp. 573–580.
 [3] A. Kulesza, B. Taskar et al., “Determinantal point processes for machine learning,” Foundations and Trends in Machine Learning, vol. 5, no. 2–3, pp. 123–286, 2012.
 [4] A. Krause, A. Singh, and C. Guestrin, “Nearoptimal sensor placements in Gaussian processes: Theory, efficient algorithms and empirical studies,” Journal of Machine Learning Research, vol. 9, no. Feb, pp. 235–284, 2008.
 [5] Y. Li, F. Baccelli, H. S. Dhillon, and J. G. Andrews, “Statistical modeling and probabilistic analysis of cellular networks with determinantal point processes,” IEEE Trans. on Commun., vol. 63, no. 9, pp. 3405–3422, 2015.
 [6] N. Miyoshi and T. Shirai, “A cellular network model with Ginibre configured base stations,” Advances in Applied Probability, vol. 46, no. 3, pp. 832–845, 2014.
 [7] B. Błaszczyszyn and P. Keeler, “Determinantal thinning of point processes with network learning applications,” 2018, available online: arXiv/abs/1810.08672.
 [8] P. C. Weeraddana, M. Codreanu, M. Latvaaho, A. Ephremides, C. Fischione et al., “Weighted sumrate maximization in wireless networks: A review,” Foundations and Trends in Networking, vol. 6, no. 1–2, pp. 1–163, 2012.
 [9] J. Gillenwater, A. Kulesza, and B. Taskar, “Nearoptimal MAP inference for determinantal point processes,” in Advances in Neural Information Processing Systems 25. Curran Associates, Inc., 2012, pp. 2735–2743.
 [10] C. Saha and H. S. Dhillon, “Matlab code for determinantal point process learning,” 2019, available at: github.com/stochasticgeometry/DPPL.
 [11] T. Osogami, R. H. Putra, A. Goel, T. Shirai, and T. Maehara, “Dynamic determinantal point processes,” in Proc. AAAI, 2018.
 [12] S. Boyd, S.J. Kim, L. Vandenberghe, and A. Hassibi, “A tutorial on geometric programming,” Optimization and Engineering, vol. 8, no. 1, p. 67, Apr 2007.
Comments
There are no comments yet.