Machine Learning meets Stochastic Geometry: Determinantal Subset Selection for Wireless Networks

05/01/2019
by   Chiranjib Saha, et al.
0

In wireless networks, many problems can be formulated as subset selection problems where the goal is to select a subset from the ground set with the objective of maximizing some objective function. These problems are typically NP-hard and hence solved through carefully constructed heuristics, which are themselves mostly NP-complete and thus not easily applicable to large networks. On the other hand, subset selection problems occur in slightly different context in machine learning (ML) where the goal is to select a subset of high quality yet diverse items from a ground set. In this paper, we introduce a novel DPP-based learning (DPPL) framework for efficiently solving subset selection problems in wireless networks. The DPPL is intended to replace the traditional optimization algorithms for subset selection by learning the quality-diversity trade-off in the optimal subsets selected by an optimization routine. As a case study, we apply DPPL to the wireless link scheduling problem, where the goal is to determine the subset of simultaneously active links which maximizes the network-wide sum-rate. We demonstrate that the proposed DPPL approaches the optimal solution with significantly lower computational complexity than the popular optimization algorithms used for this problem in the literature.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

03/14/2020

A novel and efficient algorithm to solve subset sum problem

In this paper we suggest analytical methods and associated algorithms fo...
07/03/2020

On Designing a Machine Learning Based Wireless Link Quality Classifier

Ensuring a reliable communication in wireless networks strictly depends ...
03/05/2019

Learning to Branch: Accelerating Resource Allocation in Wireless Networks

Resource allocation in wireless networks, such as device-to-device (D2D)...
11/28/2021

RPS: Portfolio Asset Selection using Graph based Representation Learning

Portfolio optimization is one of the essential fields of focus in financ...
02/28/2019

Constrained Thompson Sampling for Wireless Link Optimization

Wireless communication systems operate in complex time-varying environme...
11/18/2020

Distributed Scheduling using Graph Neural Networks

A fundamental problem in the design of wireless networks is to efficient...
09/12/2021

Link Scheduling using Graph Neural Networks

Efficient scheduling of transmissions is a key problem in wireless netwo...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

ML and SG have recently found many applications in the design and analysis of wireless networks. However, since the nature of the problems studied with these tools are so fundamentally different, it is rare to find a common ground where the strength of these tools can be jointly leveraged. While the foundation of wireless networks is built on traditional probabilistic models (such as channel, noise, interference, queuing models), ML is changing this model-driven approach to more data-driven simulation-based approach by learning the models from extensive datasets available from the real networks or field trials [1]. On the other hand, the basic premise of SG is to enhance the model-driven approach by endowing distributions on the locations of the transmitters (Tx-s) and receivers (Rx-s) so that one can derive the exact and tractable expressions for key performance metrics such as interference, coverage, and rate. In this paper, we concretely demonstrate that these two mathematical tools can be jointly applied to a class of problems known on the subset selection problems, which have numerous applications in wireless networks.

Subset selection problems. In wireless networks, a wide class of resource management problems like power/rate control, link scheduling, network utility maximization, and beamformer design fall into the category of subset selection problems where a subset from a ground set needs to be chosen which optimizes a given objective function. For most of the cases, finding the optimal subset is NP-hard. The common practice in the literature is to design some heuristic algorithms which find a local optimum under reasonable complexity. Even most of these heuristic approaches are NP-complete and are hence difficult to implement when the network size grows large.

In ML, subset selection problems appear in a slightly different context where the primary objective is to preserve the balance between quality and diversity of the items in the subset, i.e., to select good quality items from a ground set which are also non-overlapping in terms of their features. For example, assume that a user is searching the images of New York in a web-browser. The image search engine will pick a subset of stock images related to New York from the image library which contains the popular landmarks (quality) as well as ensure that one particular landmark does not occur repeatedly the search result (diversity

). Few more examples of subset selection with diversity are text summarization 

[2], citation management [3], and sensor placement [4]. The attempt to model diversity among the items in a subset selection problem brings us to the probabilistic models constructed by DPPs, which lie at the intersection of ML and SG. Initially formulated as a repulsive point process in SG [5], DPPs are natural choice for inducing diversity or negative correlation between the items in a subset. Although the traditional theoritical development of DPPs has been focused on the continuous spaces, the finite version of the DPPs have recently emerged as useful probabilistic models for the subset selection problems with quality-diversity trade-off in ML. This is due to the fact that the finite DPPs are amenable to the data-driven learning and inference framework of ML [3].

Relevant prior art on DPPs. In wireless networks, DPPs have mostly been used in the SG-based modeling and analysis of cellular networks. In these models, DPPs are used to capture spatial repulsion in the BS locations, which cannot be modeled using more popular Poisson point process (PPP) [5]. For some specific DPPs, for instance the Ginibre point process, it is possible to analytically characterize the performance metrics of the network such as the coverage probability [6]. However, the finite DPPs and the associated data-driven learning framework, which is under rapid development in the ML community has not found any notable application in wireless networks. The only existing work is [7], where the authors have introduced a new class of data-driven SG models using DPP and have trained them to mimic the properties of some hard-core point processes used for wireless network modeling (such as the Matérn type-II process) in a finite window.

Contributions. The key technical contribution of this paper is the novel DPPL framework for solving general subset selection problems in wireless networks. In order to concretely demonstrate the proposed DPPL framework, we apply it to solve the link scheduling problem which is a classical subset selection problem in wireless networks. The objective is to assign optimal binary power levels to Tx-Rx pairs so as to maximize the sum-rate [8]. The links transmitting at a higher (lower) power level will be termed active (inactive) links. Therefore, the objective is to determine the optimal subset of simultaneously active links. Similar to the subset selection problems in ML, the simultaneously active links will be selected by balancing between the quality and diversity. The links which will be naturally favored are the ones with better link quality in terms of signal-to-interference-and-noise-ratio () so that the rates on these links contribute more to the sum-rate (quality). On the other hand, the simultaneously active links will have some degree of spatial repulsion to avoid mutual interference (diversity

). With this insight, it is reasonable to treat the set of active links in the optimal solution as a DPP over the set of links in a given network. The DPP is trained by a sequence of networks and their optimal subsets which are generated by using an optimization algorithm based on geometric programming (GP). We observe that the sum-rates of the estimated optimal subsets generated by the trained DPP closely approach the optimal sum-rates. Moreover, we show that the subset selection using DPP is significantly more computationally efficient than the optimization based subset selection methods.

Ii Determinantal point process: Preliminaries

In this Section, we provide a concise introduction to DPP on finite sets. The interested readers may refer to [3] for a more pedagogical treatment of the topic as well as extensive surveys of the prior art. In general, DPPs are probabilistic models that quantify the likelihood of selecting a subset of items as the determinant of a kernel matrix (). More formally, if is a discrete set of items, a DPP is a probability measure on the power set which is defined as:

(1)

where is a random subset of and denotes the restriction on to the indices of the elements of (). We denote as the marginal kernel which is a positive semidefinite matrix such that ( is an identity matrix), i.e. all eigenvalues of are less than or equal to 1. For the learning purposes, it is more useful to define DPP with another formalism known as the -ensemble. A DPP can be alternatively defined in terms of a matrix () indexed by :

(2)

where . The last step follows from the identity (see [3, Theorem 2.1] for proof). Following [3, Theorem 2.2], and are related by the following equation:

(3)

Since, is real and symmetric by definition, its eigendecomposition is , where

is the orthonormal sequence of eigenvectors corresponding to the eigenvalues

. Using (3), can also be obtained by rescaling the eigenvalues of as:

(4)

In the ML formalism, if

is some vector representation of the

item of , then can be interpreted as a kernel matrix, i.e., where is a kernel function and is the corresponding feature map. The kernel can be further decomposed according to the quality-diversity decomposition [3] as:

(5)

where denotes the quality of () and denotes the similarity of and (). Using (5), we can write (2) after some manipulation as: where the first term denotes the diversity and second term denotes the quality of the items in . We now provide a geometric interpretation of as follows.

Remark 1.

We can intuitively interpret as the squared volume of the parallelepiped spanned by the vectors , where and . Thus, items with higher are more probable since the corresponding -s span larger volumes. Also diverse items are more probable than the similar items since more orthogonal collection of -s span larger volume (see Fig. 1 for an illustration). Thus DPP naturally balances the quality and diversity of items in a subset.

(a) increases.
(b) increases.
Fig. 1: In DPP, the probability of occurrence of a set depends on the volume of the parallelopiped with sides and angles proportional to : (a) as increases, the volume increases, (b) as increases, the volume decreases.

Iii The Proposed DPPL framework

Iii-a Conditional DPPs

Most of the learning applications are input-driven. For instance, recalling the image search example, a user input will be required to show the search results. To model these input-driven problems, we require conditional DPPs. In this framework, let be an external input. Let be the collection of all possible candidate subsets given . The conditional DPP assigns probability to every possible subset as: , where is a positive semidefinite kernel matrix. Following (2), the normalization constant is . Now, similar to the decomposition technique in (5), where denotes the quality measure of link and denotes the diversity measure of the links and () given . In [3], the authors proposed a log-linear model for the quality measure as follows:

(6)

where assigns feature values to . We will discuss the specifics of in the next Section. For , we choose the Gaussian kernel:

Iii-B Learning DPP model

We now formulate the learning framework of the conditional DPP as follows. We denote the training set

as a sequence of ordered pairs

, where is the input and is the output. Then the learning problem is the maximization of the log-likelihood of :

(7)

where

(8)

where parameterized by and . The reason for choosing the log-linear model for quality measure and Gaussian kernel is the fact that under these models, becomes a concave function of and  [3, Proposition 4.2].

Iii-C Inference

We now estimate given using the trained conditional DPP. This phase is known as the testing or inference phase. In what follows, we present two methods for choosing .

Iii-C1 Sampling from DPP

The first option is to draw random sample from the DPP, i.e., and set . We now discuss the sampling scheme for a general DPP which naturally extends to sampling from conditional DPP. We start with drawing a random sample from a special class of DPP, known as the elementary DPP and will use this method to draw samples from a general DPP.

1:procedure SampleDPP()
2:     Eigen decomposition of :
3:     
4:     for  do
5:          with probability      
6:     
7:     
8:     
9:     for 1 to  do
10:         select from with probability
11:         
12:               return
Algorithm 1 Sampling from a DPP

A DPP on is called elementary if every eigenvalue of its marginal kernel lies in . Thus an elementary DPP can be denoted as where is the set of orthonormal vectors such that . We now establish that the samples drawn according to always have fixed size.

Lemma 1.

If , then almost surely.

Proof:

If , since . Hence . Now,

Our objective is to find a method to draw a length sample . Using Lemma 1, . In what follows, we present an iterated sampling scheme that samples elements of from without replacement such that the joint probability of obtaining is . Without loss of generality, we assume . Let be the matrix whose rows contain the eigenvectors of . Then, and , where is the volume of the parallelepiped spanned by the column vectors (-s) of . Now, , where denotes the projection of onto the subspace orthogonal to . Proceeding in the same way,

(9)

Thus, the step () of the sampling scheme assuming is to select with probability proportional to and project to the subspace orthogonal to . By (9), it can be guaranteed that .

Having derived the sampling scheme for an elementary DPP, we are in a position to draw samples from a DPP. The sampling scheme is enabled by the fact that a DPP can be expressed as a mixture of elementary DPPs. The result is formally stated in the following Lemma.

Lemma 2.

A DPP with kernel is a mixture of elementary DPPs:

(10)

where .

Proof:

Please refer to [3, Lemma 2.6]. ∎

Thus, given an eigendecomposition of , the DPP sampling algorithm can be separated into two main steps: (i) sample an elementary DPP with probability proportional to , and (ii) sample a sequence of length from the elementary DPP . The steps discussed thus far are summarized in Alg. 1.

Iii-C2 MAP inference

A more formal technique is to obtain the maximum a posteriori (MAP) set, i.e., . But, finding is an NP-hard problem because of the exponential order search space . However, one can construct computationally efficient MAP inference algorithm which has similar complexity as random sampling. Due to space limitations, more formal discussions of these approximation techniques are outside the scope of the paper. We refer to [9] for one possible near-optimal MAP inference scheme for DPPs which will be used in the numerical simulations.

Iv Case Study: Link Scheduling

We will now introduce the link scheduling problem where we will apply the DPPL discussed in the previous Section.

Iv-a System Model

We consider a wireless network with Tx-Rx pairs with fixed link distance . The network can be represented as a directed bipartite graph , where and are the independent sets of vertices denoting the set of Tx-s and Rx-s, respectively and is the set of directed edges where and . Since each Tx has its dedicated Rx, the in-degree and out-degree of each node in and are one. Also . An illustration of the network topology is presented in Fig 2. Let be the complete weighted bipartite graph on with for all . Here denotes the channel gain between Tx and Rx .

Fig. 2: Illustration of link scheduling as a subset selection problem. A realization of the network () with the active link subset (). Details of the network model are mentioned in Section IV-E.

Iv-B Problem Formulation

We assume that each link can be either in active or inactive state. A link is active when the Tx transmits at a power level and is inactive when the Tx transmits at a power level (with ). Each link transmits over the same frequency band whose bandwidth is assumed to be unity. Then the sum-rate on the link is given by , where is the at the Rx: . Here is thermal noise power. The sum-rate maximization problem can be expressed as follows.

(11a)
(11b)

where the variables are . An optimal subset of simultaneously active links denoted as is the solution of (11b). Thus, and .

Iv-C Optimal Solution

The optimization problem in (11) is NP hard [8]. However, for bipartitle networks the problem can be solved by a low-complexity heuristic algorithm based on GP (see Alg. 2). For completeness, we have provided the rationale behind its formulation in Appendix -A. For further details on solving the general class of link scheduling problems, the reader is referred to [8]. Fig. 2 demonstrates a realization of the network and chosen by Alg. 2.

1:procedure SumRateMax()
2:     Initialization: given tolerance , set . Set . Compute the initial guess .
3:     repeat
4:         Solve the GP:
minimize (12a)
subject to (12b)
(12c)
(12d)
with the variables . Denote the solution by .
5:     until 
6:     if  then
7:         
8:     else
9:               return
Algorithm 2 Optimization algorithm for (11)

Iv-D Estimation of optimal subset with DPPL

We will now model the problem of optimal subset selection with DPPL. We train the DPP with a sequence of networks and the optimal subsets obtained by Alg. 2. For the training phase, we set as the realization of the network and its optimal subset. The quality and diversity measures are set as: where with and with are the two strongest interfering powers, and where and denote the locations of Tx and Rx , respectively. The ground set of the DPP . We denote the subset estimated by DPPL in the testing phase as . The block diagram of the DPPL is illustrated in Fig. 3. In order to ensure the reproducibility of the results, we provide the Matlab implementation of the DPPL for this case study in [10].

Fig. 3: Block diagram of DPPL for the link scheduling problem.

Iv-E Results and Discussions

We now demonstrate the performance of DPPL through numerical simulations. We construct the network by distributing links with m within a disc of radius m uniformly at random. We assume channel gain is dominated by the power law path loss, i.e., , where , , and is the pathloss exponent. The network during training and testing phases was generated by setting with . The instances where were discarded. We set dB, dB, and dB. The training set was constructed by independent realizations of the network. Note that changing from to did not change the values of and () significantly. In Fig. 4

, we plot the empirical cumulative distribution functions (CDFs) of the sum-rates obtained by Alg. 

2 and DPPL. We observe that the sum-rate obtained by DPPL framework closely approximates the -sum-rate. We also notice that DPP MAP inference gives better sum-rate estimates than DPP sampling. We further compare the performance with the well-known SG-based model where the simultaneously active links are modeled as independent thinning of the actual network [7]. In particular, each link is assigned

according to an independent and identically distributed (i.i.d.) Bernoulli random variable with probability

. We estimate by averaging the activation of a randomly selected link which is equivalent to: for a fixed . We see that the sum-rate under independent thinning is significantly lower than the one predicted by DPP. The reason is the fact that the independent thinning scheme is not rich enough to capture spatial repulsion which exists across the links of .

Iv-E1 Run-time Comparison

Another key strength of the proposed DPPL appears when we compare its run-time in the testing phase and Alg. 2 applied on a network . In Fig. 5, we plot the run-times of different subset selection schemes for different network sizes. The absolute values of run-times were obtained averaging the run-times of all the schemes over iterations in the same computation environment. In order to obtain a unit-free measure, we normalize these absolute values by dividing them with the average absolute run-time of Alg. 2 for . We observe that DPPL is at least times faster than Alg. 2. The run-time of Alg. 2 increases exponentially with whereas run-times of the DPPL scale as some polynomial order of .

Note that DPPL is not just a sum-rate estimator of the network, but it estimates the optimal subset of links significantly faster than the optimization algorithms. Thus, DPPL can be implemented in real networks to determine even when the network size is large. In Fig. 6, we plot the sum-rates averaged over network realizations for a given value of . Note that evaluating -sum-rates for higher values of using Alg. 2 is nearly impossible due to its exponentially increasing run-time. Quite interestingly, DPPL, thanks to its fast computation, provides some crisp insights on the network behavior: as more number of links are added, the estimated -sum-rate tends to saturate (see Fig. 6). This is expected because as long as the resources are fixed, there will be a limit on the number of simultaneously active links (irrespective of ) that would maximize the sum-rate. If the number of active links is more than this limit, sum-rate may decrease because of the increased interference. Also we observe that the performance difference between MAP-inference and DPP-sampling increases significantly at higher values of .

Fig. 4: CDF of sum-rate obtained by different subset selection schemes. Fig. 5: Comparison of run-times of Alg. 2 and DPPL in testing phase. Fig. 6: Average rates obtained for different network sizes using DPPL.

V Conclusion

In this paper, we identified a general class of subset selection problems in wireless networks which can be solved by jointly leveraging ML and SG, two fundamentally different mathematical tools used in communications and networking. To solve these problems, we developed the DPPL framework, where the DPP orginiates from SG and its learning applications have been fine-tuned by the ML community. When applied to a special case of wireless link scheduling, we found that the DPP is able to learn the underlying quality-diversity tradeoff of the optimal subsets of simultaneously active links. This work has numerous extensions. From the SG perspective, it is of interest to compute analytical expressions of the key performance metrics of the network such as the mean interference at a typical receiver or the average rate by leveraging the analytical tractability of DPPs. From the ML perspective, the DPPL can be extended to include time as another dimension and solve the subset selection problems over time (e.g. the scheduling problems in cellular networks, such as the proportional fair scheduling) using the space-time version of the DPPL (also known as the dynamic DPP [11]). From the application side, this framework can be used to solve other subset selection problems such as the user group selection in a downlink multiuser multiple-input-multiple-output (MIMO) setting.

-a Formulation of Alg. 2

Since (11) is an integer programming problem, the first step is to solve the relaxed version of the problem assuming continuous power allocations. In particular, we modify the integer constraint (11b) as . Since is an increasing function, the problem can be restated as:

(13a)
s.t. (13b)
(13c)

Since the objective function is decreasing in , we can replace the equality in (13b) with inequality. Using the auxiliary variables , (13) can be formulated as:

(14a)
(14b)
(14c)
(14d)

Now in (14), we observe that (14a) is a monomial function, (14b) contains posynomial function in the right hand side (RHS), and all the constraints contain either monomial or posynomial functions. Hence, (14) is a complementary GP [12]. If the posynomial in (14b) can be replaced by a monomial, (14) will be a standard GP. Since GPs can be reformulated as convex optimization problems, they can be solved efficiently irrespective of the scale of the problem. One way of approximating (14) with a GP at a given point is to replace the posynomial by a monomial . From , we get

(15)

Also note that , for and . Thus the local approximation of (14) will still satisfy the original constraint (14b). The modified inequality constraint becomes

(16)

where and are obtained by (15).

Since (14a) is a decreasing function of , we can substitute with its maximum value , which satisfies the other inequality constraints. Thus, can be eliminated as:

(17)

where is some constant which does not affect the minimization problem. Thus, the iteration of the heuristic runs as follows. Let be the current guess of values. The GP will provide a better solution around the current guess which is set as the initial guess in the next iteration, i.e., unless a termination criterion is satisfied. These steps are summarized in Alg. 2. To ensure that the GP does not drift away from the initial guess , a new constraint (12b) is added so that remains in the local neighborhood of . Here is the control parameter. Smaller the value of , higher is the accuracy of the monomial approximation, but slower is the convergence speed. For a reasonable tradeoff between accuracy and speed, is set to . The algorithm terminates with the quantization step which assigns discrete power levels and . Once we obtain the optimal power allocation , we quantize it into two quantization levels and by setting whenever its value lies below some threshold level or otherwise .

References

  • [1] O. Simeone, “A very brief introduction to machine learning with applications to communication systems,” IEEE Trans. on Cognitive Commun. and Networking, vol. 4, no. 4, pp. 648–664, Dec. 2018.
  • [2] A. Nenkova, L. Vanderwende, and K. McKeown, “A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization,” in Proc. SIGIR.   ACM, 2006, pp. 573–580.
  • [3] A. Kulesza, B. Taskar et al., “Determinantal point processes for machine learning,” Foundations and Trends in Machine Learning, vol. 5, no. 2–3, pp. 123–286, 2012.
  • [4] A. Krause, A. Singh, and C. Guestrin, “Near-optimal sensor placements in Gaussian processes: Theory, efficient algorithms and empirical studies,” Journal of Machine Learning Research, vol. 9, no. Feb, pp. 235–284, 2008.
  • [5] Y. Li, F. Baccelli, H. S. Dhillon, and J. G. Andrews, “Statistical modeling and probabilistic analysis of cellular networks with determinantal point processes,” IEEE Trans. on Commun., vol. 63, no. 9, pp. 3405–3422, 2015.
  • [6] N. Miyoshi and T. Shirai, “A cellular network model with Ginibre configured base stations,” Advances in Applied Probability, vol. 46, no. 3, pp. 832–845, 2014.
  • [7] B. Błaszczyszyn and P. Keeler, “Determinantal thinning of point processes with network learning applications,” 2018, available online: arXiv/abs/1810.08672.
  • [8] P. C. Weeraddana, M. Codreanu, M. Latva-aho, A. Ephremides, C. Fischione et al., “Weighted sum-rate maximization in wireless networks: A review,” Foundations and Trends in Networking, vol. 6, no. 1–2, pp. 1–163, 2012.
  • [9] J. Gillenwater, A. Kulesza, and B. Taskar, “Near-optimal MAP inference for determinantal point processes,” in Advances in Neural Information Processing Systems 25.   Curran Associates, Inc., 2012, pp. 2735–2743.
  • [10] C. Saha and H. S. Dhillon, “Matlab code for determinantal point process learning,” 2019, available at: github.com/stochastic-geometry/DPPL.
  • [11] T. Osogami, R. H. Putra, A. Goel, T. Shirai, and T. Maehara, “Dynamic determinantal point processes,” in Proc. AAAI, 2018.
  • [12] S. Boyd, S.-J. Kim, L. Vandenberghe, and A. Hassibi, “A tutorial on geometric programming,” Optimization and Engineering, vol. 8, no. 1, p. 67, Apr 2007.