Effective resource allocation plays a crucial role for performance optimization in wireless networks. However, typical resource allocation problems, such as power control [7, 8], are non-convex and computationally challenging. Moreover, they need to be solved in a real-time manner to accommodate the time variation of wireless channels. Great efforts have been put to develop effective algorithms for wireless resource allocation, and design solutions have been obtained with powerful convex optimization based approaches. Nevertheless, the resulting algorithms still fall short, given the increasing density of wireless networks and the more stringent latency requirement of emerging mobile applications.
Inspired by the recent successes of deep learning, researchers have attempted to apply deep learning based methods to solve NP-hard optimization problems in wireless networks [1, 2, 3, 9, 10, 11]. As a classic wireless resource allocation problem, power control in the -user interference channel has attracted most of the attention [1, 2, 3, 4, 5]. The first attempts came from [1, 2], which applied MLP and CNN, respectively, to approximate the classic weighted minimum mean square error (WMMSE) algorithm 
and accelerate the computation. Unsupervised learning and an ensembling mechanism were employed in to achieve better performance than the sub-optimal WMMSE algorithm.
However, MLP and CNN, which are designed for image processing, may not be suitable for problems in wireless communication. In particular, the performance of these methods degrades dramatically when the network size becomes large. This is because MLP and CNN fail to exploit the underlying topology of wireless networks. To enable more efficient learning, spatial convolution  and graph embedding  have been proposed to exploit the Euclidean geometry of the users’ geolocations. These methods are scalable to large-size networks. However, they have major disadvantage, namely, they can not utilize the instantaneous channel state information (CSI), which can not be embedded into the Euclidean space. This leads to poor performance in fading channels. Another drawback is that they have trouble in dealing with problems with heterogeneity, e.g., weighted sum rate maximization. A comparison of the existing works for -user interference channel power control is shown in Table I.
Graph neural networks (GNNs) can effectively exploit non-Euclidean data , e.g., CSI. In this paper, to overcome the limitations mentioned above, we propose to employ GNNs for wireless power control in -user interference channels. Specifically, a -user interference channel can be naturally modeled as a complete graph, where the quantitative information of wireless networks, e.g., CSI, is incorporated as the features of the graph. Based on the principle of graph neural networks, we propose interference graph convolutional networks
(IGCNet) to learn the optimal resource allocation in an unsupervised manner. It is shown that IGCNet is a universal approximator of continuous set functions, which well preserves the permutation invariance property of the interference links. Extensive simulations will demonstrate that the proposed IGCNet not only outperforms the state-of-the art optimization-based WMMSE algorithm and existing learning-based methods under various system configurations, but also achieves significant speedup over WMMSE. Furthermore, we will show that the proposed IGCNet can handle estimation uncertainty, e.g., CSI uncertainty, both theoretically and empirically. For reproducibility, the code to produce the results in this paper has been made available on github111https://github.com/yshenaw/Globecom2019.
Ii-a System Model
We consider the power control problem in a -user interference channel with single-antenna transceiver pairs. The received signal at the -th receiver is given by
where denotes the direct-link channel between the -th transmitter and receiver, denotes the cross-link channel between transmitter and receiver , denotes the data symbol for the -th receiver, and is the additive Gaussian noise.
The signal-to-interference-plus-noise ratio (SINR) for the -th receiver is given by
where is the power of the -th transmitter, and .
Denote as the power allocation vector. The objective is to find the optimal power allocation to maximize the weighted sum rate, and the problem is formulated as
where is the weight for the -th pair. The channel matrix is defined as and .
This problem is known to be NP-hard . Although several optimization-based methods have been proposed in [12, 14], they are computationally demanding, and thus cannot be applied for real-time implementation 
. To alleviate the computation burden while achieving near-optimal performance, machine learning based methods have been proposed. Specifically, MLP[1, 3] and CNN  have been used to approximate the input-output mapping of this problem. The optimization-based methods involve many iterations, with each iteration having a time complexity of . In contrast, the total complexities of these learning-based methods are , and thus they can achieve significant speedups.
Ii-B Existing Approaches’ Limitations
Fig. 1 illustrates MLP and CNN based approaches for power control. From the numerical experiments in [1, 3], we observe a performance loss when gets larger. For example, in , the performance gap to the WMMSE algorithm is when and becomes when . From the perspective of approximation theory, an MLP with a sufficient number of parameters can learn anything if we have sufficient training samples . However, in practice, there are lots of redundancy in an MLP because it is fully connected. Such redundancy causes overfitting and makes it difficult to train. This is the reason for the performance loss when the input and output dimensions are large.
CNN has demonstrated its effectiveness in solving such performance deterioration problems in image analysis applications , but it is not effective for wireless power control. Specifically, for images, the geometric property means that adjacent pixels are meaningful to be considered together . In CNN, a 2D convolution kernel is applied to each patch (adjacent pixels) in the image. The weights in the neural network are shared among different patches. This leads to a significant reduction in the number of parameters, which leads to a lower sample complexity and also makes it easy for training. This accounts for the superior performance of CNN in image processing. Unfortunately, the geometric property in images does not hold for a channel matrix since a patch does not contain any specific meaning for the power control problem. Thus, although using CNN can reduce the number of parameters, it suffers from a large performance degradation, as will be further shown in Section IV-A.
There have been some attempts to leverage the geometry of users’ geolocation to achieve scalability, i.e., the ability to deal with large-size wireless systems. One study  applied the idea of convolution in the spatial domain. Specifically, the whole considered area is first divided into -by- grids, followed by computing the number of active users in each grid as its density. The spatial convolution is a convolution operator on the density grid. In this scenario, the neighbor grids are useful because the nearest users will cause the strongest interferences. One major drawback of this work is that it requires a large number of samples for training. To address this issue,  proposed to use distance quantization and graph embedding. However, spatial convolution and distance quantization are merely operating on the distances and can not incorporate instantaneous CSI. Thus, it results in poor performance when fading exists (as shown in Table V in ). Furthermore, they are not able to deal with the weighted problems.
In the next section, we will discuss the geometric properties of the -user interference channels and design the corresponding neural network.
Iii Learning Optimal Power Control on Interference Graph
In this section, we first model a -user interference channel as a complete graph, followed by a brief introduction to GNNs. Under the framework of GNNs, we propose IGCNet to learn the optimal power control on an interference graph in an unsupervised manner. The theoretical analysis for IGCNet is presented at the end of this section.
Iii-a Graph Representation and Geometric Properties
In this subsection, we model the -user interference channel as a complete graph with vertex and edge labels. We view the -th transmitter-receiver pair as the -th vertex. The vertex label contains the state of the direct channel and the weight of the -th pair, i.e., . One edge between two vertices indicates an interference link, with label as the states of the interference channels and . An illustration of a 3-user interference channel is shown in Fig. 2.
We next discuss the geometry of the interference channel by looking at the map from the channel matrix and weights to the optimal power control vector.
For a given , let denote the function that maps the channel matrix and the weights to the optimal power allocation of the -th transmitter, i.e., , and let denote any permutation matrix satisfying . Then, .
This can be interpreted as the unordered property of interference channels : It is the collection of interference channel coefficients instead of the ordering of these coefficients that matter. The irrelevance in the ordering leads to the permutation invariance property of the channel matrix. This property suggests that only considering the neighborhood elements, such as in CNN, is meaningless because the elements are no longer close to each other after the permutation. This invariance property indicates that all the edges with the same end node are homogeneous, and will allow us to share weights among all the edges of a node. In other words, we can restrict the hypothesis space of the designed neural network for one node to the space of set functions, which leads to GNNs.
Iii-B Graph Neural Networks
In this subsection, we give a brief introduction to GNNs, and one can refer to [17, 13] for a more detailed information. GNNs deal with learning problems with graph data or non-Euclidean data. There are many sucessful applications of GNNs such as recommendation systems  and solving combinatorial problems . GNNs utilize the graph structure of data, the node features, and the edge features to learn a good representation of the vertices. Like MLP or CNN, GNNs have layer-wise structures. In each layer, for each vertex, GNNs update the representation of this node by aggregating features from its edges and its neighbor vetices. Specifically, the update rule of the -th layer at vertex in GNNs is
where denotes the set of the neighbors of , denotes the set of edges with as one end node, and are two functions, denotes the -th layer’s output feature of vertex , and is an intermediate variable.
The design of the two functions in GNNs is crucial and leads to different kinds of GNNs . The most popular GNNs are listed as follows.
Structure2Vec : It uses the sum pooling and relu as the aggregation and combination functions,
where , and and are the weight matrices to be learned.
Graph Isomorphism Network : It uses the MLP and sum pooling as the aggregation and combination functions,
where , is an MLP, and different MLPs are used in different layers.
Iii-C Interference Graph Convolutional Networks
In this paper, we design the aggregation and combination functions of a GNN following two principles based on the observations in Section III-A. First, the designed neural network should capture the permutation invariance property of the interference channel, as stated in Proposition 1. Second, the designed neural network should be robust to the inaccurate measurements, e.g., imperfect CSI, which is critical for the practical implementation. Thus, the proposed aggregation functions (neural networks) should not only be a good approximator of set functions, but also robust to the corruptions of edge labels. For the first requirement, the idea is to use a symmetric function on transformed elements in the point set to approximate a general function defined on the set :
where , and is a symmetric function. For the implementation of this paper, we use a -layer MLP as , and as , and a -layer MLP as . Specifically, the update rule is
where is to take the largest value in a set, MLP1 and MLP2 denote two different MLPs, CONCAT denotes the operation that concatenates two vectors together, and denotes the feature vector of the edge connecting vertex and vertex . It will be shown in the next subsection that the proposed aggregation function also satisfies the second requirement. An illustration of the proposed network structure and parameter setting is shown in Fig. 3.
The loss function adopted is the negative sum rate, as in[3, 2],
where denotes the parameters of the neural networks and
is the power value generated by the neural networks. Note that this loss function is differentiable and can be directly optimized by stochastic gradient descent. Thus, IGCNet is an unsupervised method and it only needs the channel matrices as samples without labels for training.
The proposed IGCNet can also deal with other objective functions in the -user interference channel. To achieve this goal, one can simply replace the loss function with the negative objective function to be maximized.
Iii-D Theoretical Analysis
In this subsection, we show the universal approximation property and robustness of the proposed IGCNet.
(Universal Approximation) Suppose is a continuous set function. Then, a continuous function h and a symmetric function , such that ,
where is the elements in , is a continuous function, and POOLING is a pooling operation, i.e., or or .
(Robustness) Suppose such that and . Then,
Theorem 1 states that one-layer IGCNet is a universal approximator of continuous set functions. Note that the aggregation function of GNNs is a set function. Thus, IGCNet has the aggregation function with the most powerful representation ability in the class of GNNs. Besides, it well respects the permutation invariance property of the interference channel because the set function is permutation invariant to the input.
Theorem 2 states that remains the same up to the corruptions of the input if all the features in are preserved and only contains a bounded number of features, smaller than . This means that only a small proportion of features are critical and IGCNet is robust to the corruptions of other features. The practical meaning is that IGCNet can provide a near-optimal solution even when some CSI is not available.
Iv-a Gaussian Interference Channel Power Control
to set up simulations. Under this system setting, all the weights are the same and the channel coefficients are taken from the standard normal distribution. In the experiment, three network setupsare considered. We mainly compare the proposed IGCNet with the following five benchmarks:
MLP : It leverages MLP to learn the input-output mapping of WMMSE.
PCNet : It employs MLP and an unsupervised loss function to learn near-optimal power allocation.
DPC : CNN and the unsupervised loss function are used in this method to learn a near-optimal power control.
Baseline: We find a fixed proportion of pairs with the largest coefficients , and set the power of these pairs as , while the power for other pairs are set as
. This algorithm ignoring the interference is the simplest but effective heuristic algorithm and we shall use it as the baseline.
We generate training samples, i.e., network realizations, to train MLP, PCNet, and DPC as in [1, 2] while the number of training samples used for IGCNet is . The test dataset contains network realizations. We use a -layer IGCNet and adopt the adam optimizer with a learning rate of to train IGCNet. The simulation results are shown in Table II.
It is shown that the proposed IGCNet not only outperforms the learning-based method, but also achieves a better performance than WMMSE. We also see that the performance of IGCNet is stable while other learning-based methods suffer from performance degradation when increases. The difference between PCNet and IGCNet is that IGCNet utilizes the graph structure and the weights are shared. Thus, it suggests that leveraging the graph structure of the interference channel is useful for maintaining good performance when the network size is large. Besides, we observe that the performance gap between other learning-based methods and the baseline vanishes when . This may imply that these models can hardly learn the impact of interference when is large.
We next test the performance of IGCNet in the weighted sum rate maximization. We take
from the uniform distribution inand test the performance of IGCNet, WMMSE, and the baseline algorithm. The results are shown in Table III.
From Table III, we see that the performance of IGCNet is better than other benchmarks. This demonstrates that IGCNet can handle the weighted problem without requiring a large amount of samples.
An important test of the usefulness of neural network’s design is its ability to generalize to different layouts and link distributions . In this subsection, we test the generalization performance of the proposed IGCNet in two different settings.
Iv-B1 Varying User Locations
We first test the performance of the proposed algorithm under the situation where the locations of users in each sample vary. The transmitters are uniformly distributed in the square region meters. The receivers are uniformly distributed within meters away from the transmitter. The adopted channel model is
where the path loss model is ,
is the shadowing coefficient, the standard deviation of log-norm shadowing isdB, dBi is the transmit antenna power gain, is the small scale fading, and the noise power is dBm. We use equal weights to test the performance of different algorithms, with the results shown in Table IV. We see that IGCNet also has a superior performance under the system settings with varying user locations.
Iv-B2 Varying Distance Distribution
It was reported in  that spatial convolution is sensitive to the link distance distribution. We also check the performance of IGCNet when the link distance distribution in the test is different from that in the training. We follow  to set up the simulation. The link distance is uniformly distributed in meters during training. In the test, the link distance is uniformly distributed in meters, where is uniform in meters and is uniform in meters. The performance of IGCNet is compared to WMMSE in this situation. It shows that the performance of IGCNet under this setting is still good.
In this subsection, the situation with partial CSI and noisy CSI are tested. We use the pre-trained model for in Section IV-B1.
Iv-C1 Partial CSI
We simulate the situation where CSI of some links cannot be obtained. To test the performance under this partial CSI setting, we set a fixed proportion of with the largest distance as . We define the relative performance as the sum rate achieved by IGCNet with the partial CSI divided by the that achieved by the case with full CSI. The relative performance of IGCNet versus the missing CSI ratio is shown in Fig. 4. We see that IGCNet achieves performance of the full CSI case when of the links are set to in the available CSI, which verifies the robustness shown in Theorem 2.
Iv-C2 Noisy CSI
We simulate the situation where the CSI is inaccurate. We use the pre-trained model in Section IV-B1. To test the performance under the noisy CSI setting, we use the noisy channel matrix as the input of IGCNet, where
. We define the relative noise variance as. The relative performance of IGCNet versus the missing CSI ratio is shown in Fig. 4. We see that IGCNet achieves its performance when the relative noise variance is . This suggests IGCNet is robust to CSI inaccuracy.
Iv-D Time Comparison
It was reported in [1, 3] that learning-based methods have less computation time than the optimization-based methods. We also compare the average running time for WMMSE and IGCNet under the system setting in Section IV-B1, as shown in Table V. It can be concluded that IGCNet is significantly faster than WMMSE, up to about 65x speedup when . This is because WMMSE involves many iterations, and each iteration has time complexity while the total complexity of IGCNet is .
Iv-E Ablation Study
We provide the ablation study of IGCNet in terms of the numbers of training samples and the number of layers of IGCNet. We assume Rayleigh fading and .
We first study the impact of the numbers of training samples. We set the numbers of training samples as and observe the test performance of IGCNet. The results are shown in Table VI. We see that the performance first increases then becomes stable when the number of samples increases. It also suggests samples are sufficient for this setting, which are much less than the samples needed for previous works.
We then study the impact of the number of layers. -hop information is gathered if a -layer IGCNet is used. Intuitively, IGCNet with a larger number of layers will have better performance. We set the numbers of layers as and observe the test performance of IGCNet.
The performance improves as the number of layers increase. This is not surprising because IGCNet with more layers captures more information. We also see a huge performance gain from -layer IGCNet to -layer IGCNet, which shows that multi-hop information is crucial for the performance. From Table II and Table VII, we find that -layer IGCNet still outperforms MLP, PCNet, and DPC, which demonstrates the benefits of leveraging the graph structure.
In summary, the extensive simulations listed in Section IV have shown that
IGCNet not only outperforms other state-of-the-art learning-based methods, but also has better performance than the most popular optimization-based method, WMMSE, under various system configurations.
IGCNet can generalize to different layouts and link distributions.
IGCNet is robust to partial CSI and noisy CSI.
IGCNet achieves significant speedups over WMMSE.
In this paper, we developed a novel graph neural network for the -user interference channel power control problem. The unique advantages include scalability, ability to incorporate instantaneous CSI, and ability to solve weighted problems. This is achieved by leveraging the geometric property and graph structure of the interference channels. For future directions, it will be interesting to test the effectiveness of IGCNet in other wireless resource allocation problems. We envisioned that machine learning based methods will play a critical role in future wireless networks .
-  H. Sun, X. Chen, Q. Shi, M. Hong, X. Fu, and N. D. Sidiropoulos, “Learning to optimize: Training deep neural networks for interference management,” IEEE Trans. Signal Process., vol. 66, pp. 5438 – 5453, Oct. 2018.
-  W. Lee, M. Kim, and D.-H. Cho, “Deep power control: Transmit power control scheme based on convolutional neural network,” IEEE Commun. Lett., vol. 22, pp. 1276–1279, Apr. 2018.
-  F. Liang, C. Shen, W. Yu, and F. Wu, “Towards optimal power control via ensembling deep neural networks,” arXiv preprint arXiv:1807.10025, 2018.
-  W. Cui, K. Shen, and W. Yu, “Spatial deep learning for wireless scheduling,” IEEE J. Sel. Areas Commun., vol. 37, Jun. 2019.
-  M. Lee, G. Yu, and G. Y. Li, “Graph embedding based wireless link scheduling with few training samples,” arXiv preprint arXiv:1906.02871, 2019.
-  H. Dai, B. Dai, and L. Song, “Discriminative embeddings of latent variable models for structured data,” in Proc. Int. Conf. Mach. Learning, pp. 2702–2711, Jun. 2016.
-  M. Chiang, P. Hande, T. Lan, C. W. Tan, et al., “Power control in wireless cellular networks,” Found. Trends Networking, vol. 2, no. 4, pp. 381–533, 2008.
-  Y. Shi, J. Zhang, and K. B. Letaief, “Group sparse beamforming for green Cloud-RAN,” IEEE Trans. Wireless Commun., vol. 13, pp. 2809–2823, May 2014.
-  Y. Shen, Y. Shi, J. Zhang, and K. B. Letaief, “LORM: Learning to optimize for resource management in wireless networks with few training samples,” arXiv preprint arXiv:1812.07998, 2018.
-  A. Zappone, M. Di Renzo, M. Debbah, T. T. Lam, and X. Qian, “Model-aided wireless artificial intelligence: Embedding expert knowledge in deep neural networks towards wireless systems optimization,” arXiv preprint arXiv:1808.01672, 2018.
-  M. Lee, G. Yu, and G. Y. Li, “Learning to branch: Accelerating resource allocation in wireless networks,” arXiv preprint arXiv:1903.01819, 2019.
-  Q. Shi, M. Razaviyayn, Z. Luo, and C. He, “An iteratively weighted MMSE approach to distributed sum-utility maximization for a mimo interfering broadcast channel,” IEEE Trans. Signal Process., vol. 59, pp. 4331–4340, Sept. 2011.
-  Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu, “A comprehensive survey on graph neural networks,” arXiv preprint arXiv:1901.00596, 2019.
-  K. Shen and W. Yu, “Fractional programming for communication systems—part i: Power control and beamforming,” IEEE Trans. Signal Process., vol. 66, pp. 2616–2630, May 2018.
-  K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward networks are universal approximators,” Neural networks, vol. 2, no. 5, pp. 359–366, 1989.
W. Brendel and M. Bethge, “Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet,”Proc. Int. Conf. Learning Representation, May 2019.
-  K. Xu, W. Hu, J. Leskovec, and S. Jegelka, “How powerful are graph neural networks?,” Proc. Int. Conf. Learning Representation, May 2019.
-  R. Ying, R. He, K. Chen, P. Eksombatchai, W. L. Hamilton, and J. Leskovec, “Graph convolutional neural networks for web-scale recommender systems,” in Proc. ACM Int. Conf. Knowl. Discovery Data Mining, pp. 974–983, ACM, Aug. 2018.
Z. Li, Q. Chen, and V. Koltun, “Combinatorial optimization with graph convolutional networks and guided tree search,” inProc. Adv. Neural Inform. Process. Syst., pp. 539–548, Dec. 2018.
-  T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” Proc. Int. Conf. Learning Representation, Apr. 2017.
-  C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” in , pp. 652–660, 2017.
-  K. B. Letaief, W. Chen, Y. Shi, J. Zhang, and Y.-J. A. Zhang, “The roadmap to 6G–AI empowered wireless networks,” arXiv preprint arXiv:1904.11686, 2019.