Radio resource management, e.g., power control  and beamforming , plays a crucial role in wireless networks. Unfortunately, many of these problems are non-convex and computationally challenging. Moreover, they need to be solved in a real-time manner given the time-varying wireless channels and the latency requirement of many mobile applications. Great efforts have been put forward to develop effective algorithms for these challenging problems. Existing algorithms are mainly based on convex optimization approaches [31, 29], which have a limited capability in dealing with non-convex problems and scale poorly with the problem size. Problem specific algorithms can be developed, which, however, is a laborious process and requires much problem specific knowledge.
, to solve the power control problem, a multi-layer perceptron (MLP) was used to approximate the input-output mapping of the classic weighted minimum mean square error (WMMSE) algorithm to speed up the computation. The second paradigm is “learning alongside optimization”, which replaces some ineffective policy in a traditional algorithm with a neural network. For example, an MLP was utilized in  to replace the pruning policy in the branch-and-bound algorithm. Accordingly, significant speedup and performance gain in the access point selection problem was achieved compared with the optimization-based methods in [30, 28].
or convolutional neural networks (CNNs)[17, 36]. These architectures are inherited from the ones developed for image processing tasks and thus are not tailored to problems in wireless networks. Although near-optimal performance is achieved for small-scale wireless networks, they fail to exploit the wireless network structure and thus suffer from poor scalability and generalization in large-scale radio resource management problems. Specifically, the performance of these methods degrades dramatically when the wireless network size becomes large. For example, it was shown in  that the performance gap to the WMMSE algorithm is when and becomes when . Moreover, these methods generalize poorly when the number of agents in the test dataset is larger than that in the training dataset. In dense wireless networks, resource management may involve thousands of users simultaneously and the number of users changes dynamically, thus, making the wide application of these learning-based methods very difficult.
A long-standing idea to improve scalability and generalization is to incorporate the structures of the target task into the neural network architecture [2, 38, 32]. A prominent example is the development of CNNs for computer vision, which is inspired by the fact that the neighbor pixels of an image are useful when they are considered together . To achieve better scalability, structures in a single-antenna system with homogeneous agents have recently been exploited for effective neural network architecture design [6, 9]. In static channels, observing that channel states are deterministic functions of users’ geo-locations in a 2D Euclidean space, spatial convolution was developed in , which is applicable in wireless networks with thousands of users but cannot handle fading channels. With fading channels, it was observed that the channel matrix can be viewed as the adjacency matrix of a graph . From this perspective, a random edge graph neural network (REGNN) operating on such a graph was developed, and it was demonstrated that it inhibits a good generalization property when the number of users in the wireless networks changes. However, in a multi-antenna system or a single-antenna system with heterogeneous agents, the channel matrix no longer fits the form of an adjacency matrix and the REGNN cannot be applied.
In this paper, we address the limitations of existing works by modeling wireless networks as wireless channel graphs and develop neural networks to exploit the graph topology. Specifically, we treat the agents as nodes in a graph, communication channels as directed edges, agent specific parameters as node features, and channel related parameters as edge features. Subsequently, low-complexity neural network architectures operating on wireless channel graphs will be proposed.
Existing works (e.g., [33, 26, 16]) also have another major limitation, namely, they treat the adopted neural network as a black box. Despite the superior performance in specific applications, it is hard to interpret what is learned by the neural networks. To ensure reliability, it is crucial to understand when the algorithm works and when it fails. Thus, a good theoretical understanding is demanded for the learning-based radio resource management methods. Compared with learning-based methods, conventional optimization-based methods are well-studied. This inspires us to build a relationship between these two types of methods. In particular, we shall prove the equivalence between the proposed neural networks and a favorable class of optimization-based methods. This equivalence will allow the development of tractable analysis for the performance and generalization of the learning-based methods through the study of their equivalent optimization-based methods.
In this paper, we develop scalable learning-based methods to solve radio resource management problems in dense wireless networks. The major contributions are summarized as follows:
We model wireless networks as wireless channel graphs and formulate radio resource management problems as graph optimization problems. We then show that a permutation equivariance property holds in general radio resource management problems, which can be exploited for effective neural network architecture design.
We identify a favorable class of neural networks operating on wireless channel graphs, namely MPGNNs. In such neural networks, the feature of each node is updated by aggregating information from local nodes and edges with a low-complexity permutation invariant function. Thus, MPGNNs satisfy the permutation equivariance property, and have the ability to generalize to large-scale problems while enjoying a high computational efficiency.
For an effective implementation, we propose a wireless channel graph convolution network (WCGCN) within the MPGNN class. Besides inheriting the advantages of MPGNNs, the WCGCN enjoys several unique advantages for solving radio resource management problems. First, it can effectively exploit both agent-related features and channel-related features effectively. Second, it is insensitive to the corruptions of features, e.g., channel state information (CSI), implying that they can be applied with partial and imperfect CSI.
To provide interpretability and theoretical guarantees, we prove the equivalence between MPGNNs and a class of distributed optimization algorithms, which include many classic algorithms for radio resource management, e.g., WMMSE . Based on this equivalence, we analyze the performance and generalization of MPGNN-based methods in the weighted sum rate maximization problem.
We test the effectiveness of WCGCN for power control and beamforming problems, training with unlabeled data. Extensive simulations will demonstrate that the proposed WCGCN matches or outperforms classic optimization-based algorithms without domain knowledge, and with significant speedups. Remarkably, WCGCN can solve the beamforming problem with users within milliseconds on a single GPU.111The codes to reproduce the simulation results will be made available soon.
Throughout this paper, superscripts , , denote conjugate transpose, transpose, inverse, respectively. The set symbol in this paper denotes a multiset. A multiset is a -tuple where is the underlying set of that is formed from its distinct elements, and gives the multiplicity of elements. For example, is a multiset where element has multiplicity and element has multiplicity .
Ii Graph Modeling of Wireless Networks
In this section, we model wireless networks as graphs, and formulate radio resource management problems as graph optimization problems. Key properties of radio resource management problems will be identified, which will then be exploited to design effective neural network architectures.
Ii-a Directed Graphs and Permutation Equivariance Property
A directed graph can be represented as an ordered pair, where is the set of nodes and is the set of edges. Node is adjacent to node if , denoted as . Two graphs and are isomorphic if there is a bijection such that , denoted by . The adjacency matrix of is an matrix , where if and only if for all . A directed graph can be represented as an adjacency matrix. The permutation corresponds to a permutation matrix . The rows (or columns) of are rearranged if is left (or right) multiplied to . The matrix is also an adjacency matrix. Graphs corresponding to adjacency matrices and are isomorphic since applying the permutation is a re-ordering of nodes, denoted by .
We now introduce optimization problems defined on directed graphs, and identify their permutation invariance and equivariance properties. We assign each node an optimization variable . Denote , then an optimization problem defined on graph can be written as
where represents the objective function and represents the constraint.
As , optimization problems defined on graphs have the permutation invariance property as stated below.
(Permutation invariance) For any permutation matrix , the optimization problem defined in (1) has the following property
Since adjacency matrices and represent the same graph, permuting and simultaneously is simply a reordering of the variables. As a result, we have and . ∎
The permutation invariance property of the objective value and constraint leads to the corresponding property of sublevel sets. We first define the sublevel sets.
(Sublevel sets) The sublevel set of a function is defined as
where is the feasible domain.
Denote the optimal objective value of (1) as , and the set of -accurate solutions as . Thus, the properties of sublevel sets imply the properties of near-optimal solutions. Specifically, the permutation invariance property of the objective function implies the permutation equivariance property of the sub-level sets, which is stated in the next proposition.
(Permutation equivariance) Denote as the sublevel set of in (1), and define . Then,
where is any permutation matrix.
The permutation equivariance property of sublevel sets is a direct result of the permutation invariance in the objective function. Please refer to Appendix A for a detailed proof.
In the next subsection, by modeling wireless networks as graphs, we show that the permutation equivariance property is universal in radio resource management problems.
Ii-B Wireless Network as a Graph
A wireless network can be modeled as a directed graph with node and edge features. Naturally, we treat each agent, e.g., a mobile user or a base station, in wireless networks as a node in the graph. An edge is drawn from node to node if there is a direct communication or interference link with node as the transmitter and node as the receiver. The node feature incorporates the properties of the agent, e.g., users’ weights in the weighted sum rate maximization problem . The edge feature includes the properties of the corresponding channel, e.g., a scalar (or matrix) to denote the channel state of a single-antenna (or multi-antenna) system. We call these graphs generated by the wireless network topology as wireless channel graphs. Formally, a wireless channel graph is an ordered tuple , where is the set of nodes, is the set of edges, maps a node to its feature, and maps an edge to its feature. Denote . Also define the node feature array as with , and the adjacency feature array as
is a zero vector in.
We assign each node an optimization variable . Let , then an optimization problem defined on a wireless channel graph can be written as
where denotes the objective function and denotes the constraint.
Next we elaborate the properties of the radio resource management problems on the wireless channel graphs. Without node features or edge features, a wireless channel graph is a directed graph. As a result, the properties of wireless channel graphs follow the properties of directed graphs. We elaborate the permutation equivariance property of problems on wireless channel graphs next. We call the three dimensions of as row, column, and depth. The permutation operator for and is defined as follows. The left permutation operator rearranges the rows and the right permutation operator rearranges columns according to a permutation . Similar to optimization problems on directed graphs, the ones defined on wireless channel graphs have the permutation invariance property. As a result, the sub-level sets of in (3) also have the permutation equivariance property, which is stated below.
(Permutation equivariance) Let denote the sublevel set of in (3), and define . Then,
where the permutation matrix , left permutation operator , and right permutation operator are associated with the same permutation .
This result establishes a general permutation equivariance property for radio resource management problems. Proposition II.3 is reduced to the results in  if is a constant array and . Compared with , Proposition II.3 is able to handle heterogeneous agents (e.g., users with different resource constraints) and more general channels (e.g., multi-antenna channels) as the heterogeneity can be modeled as node features and multi-antenna channel states can be modeled as edge features. The proof is the same as Proposition II.2 by simply changing notations.
Ii-C Graph Modeling of -user Interference Channels
In this subsection, as a specific example, we present graph modeling of a classic radio resource management problem, i.e., beamforming for weighted sum rate maximization in a -user interference channel. It will be used as the main test setting for the theoretical study in Section IV-C and simulations in Section V. There are in total transceiver pairs where each transmitter is equipped with antennas and each receiver is equipped with a single antenna. Let denote the beamformer of the -th transmitter. The received signal at receiver is , where denotes the channel state from transmitter to receiver and
denotes the additive noise following the complex Gaussian distribution.
The signal-to-interference-plus-noise ratio (SINR) for receiver is given by
Denote as the beamforming matrix. The objective is to find the optimal beamformer to maximize the weighted sum rate, and the problem is formulated as
where is the weight for the -th pair.
We view the -th transceiver pair as the -th node in the graph. As distant agents cause little interference, we draw a directed edge from node to node only if the distance between transmitter and receiver is below a certain threshold . An illustration of such a graph modeling is shown in Fig. 1. The node feature array is given by
and the adjacency feature array is given by
where is a zero vector. Problem (4) has the permutation equivariance property with respect to , , and . To solve this problem efficiently and effectively, the adopted neural network should exploit the permutation equivariance property, and incorporate both node features and edge features. We shall develop an effective neural network architecture to achieve this goal in the next section.
Iii Neural Network Architecture Design for Radio Resource Management
In this section, we endeavor to develop a scalable neural network architecture for radio resource management problems. A favorable class of GNNs, named, message passing graph neural networks, will be identified. The key properties and effective implementation will also be discussed.
Iii-a Optimizing Wireless Networks via Graph Neural Networks
Most of existing works on “learning to optimize” approaches to solve problems in wireless networks adopted MLPs as the neural network architecture [33, 19, 26]. Although MLPs can approximate well-behaved functions , they suffer from poor performance in data efficiency, robustness, and generalization. A long-standing idea for improving the performance and generalization is to incorporate the structures of the target task into the neural network architecture. In this way, there is no need for the neural network to learn such structures from data, which leads to a more efficient training, and better generalization empirically [22, 32] and provably .
As discussed above, the structures of radio resource management problems can be formulated as optimization problems on wireless channel graphs, which enjoy the permutation equivariance property. In machine learning, there are two classes of neural networks that are able to exploit the permutation equivariance property, i.e., graph neural networks (GNNs) and Deep Sets . Compared with Deep Sets, GNNs not only respect the permutation equivariance property but can also model the interactions among the agents. In wireless networks, the agents interact with each other through channels. Thus, GNNs are more favorable than Deep Sets in wireless networks. This motivates us to adopt GNNs to solve radio resource management problems.
Iii-B Message Passing Graph Neural Networks
In this subsection, we shall identify a favorable class of GNNs for radio resource management problems, which extend CNNs to wireless channel graphs. In traditional machine learning tasks, the data can typically be embedded in a Euclidean space, e.g., images. Recently, there is an increasing number of applications generated from the non-Euclidean spaces that can be naturally modeled as graphs, e.g., point cloud  and combinatorial problems . This motivates researchers to develop GNNs 
, which effectively exploit the graph structure. GNNs generalize traditional CNNs, recurrent neural networks, and auto-encoders to the graph tasks. In wireless networks, while the agents are located in the Euclidean space, channel states cannot be embedded in a Euclidean space. Thus, the data in radio resource management problems is also non-Euclidean and neural networks operating on non-Euclidean space are necessary when adopting “learning to optimize” approaches in wireless networks.
As a background, we first introduce CNNs, which operate on Euclidean data. Compared with MLPs, CNNs have shown superior performance in image processing tasks. The motivation for CNNs is that adjacent pixels are meaningful to be considered together in images 
. Like MLPs, CNNs have a layer-wise structure. In each layer, a 2D convolution is applied to the input. Here we consider a simple CNN with a rectified linear unit and without pooling. In the-th layer, for a pixel located at , the update is
where denotes pixel of the input image, denotes the hidden state of pixel at the -th layer, and denotes the weight matrix in the -th layer, and denotes the neighbor pixels of pixel . Specifically, for a convolution kernel of size , we have
and a common choice of is .
Despite the great success of CNNs in computer vision, they cannot be applied to non-Euclidean data. In , CNNs are extended to graphs from a spatial perspective, which is as efficient as CNNs, while enjoying performance guarantees on graph isomorphism test. We refer to this architecture as the spatial graph convolutional networks (SGNNs). In each layer of a CNN (5), each pixel aggregates information from neighbor pixels and then updates its state. As an analogy, in each layer of a SGNN, each node updates its representation by aggregating features from its neighbor nodes. Specifically, the update rule of the -th layer at vertex in a SGNN is
where denotes the input feature of node , denotes the hidden state of node at the -th layer, denotes the set of the neighbors of , is a set function that aggregates information from the node’s neighbors, and is a function that combines aggregated information with its own information. An illustration of the extension from CNNs to SGNNs is shown in Fig. 2. Particularly, SGNNs include spatial deep learning for wireless scheduling  as a special case.
Despite the success of SGNNs in graph problems, it is difficult to directly apply SGNNs on radio resource allocation problems as they cannot exploit the edge features. This means that they cannot incorporate channel states in wireless networks. We modify the definition in (6) to exploit edge features and will refer to it as message passing graph neural networks (MPGNNs). The update rule for the -th layer at vertex in an MPGNN is
where denotes the edge feature of the edge (i.e., in (2)). The output of a -layer MPGNN is .
The extension from SGNNs to MPGNNs is simple but crucial, due to the following two reasons. First, MPGNNs respect the permutation equivariance property in Proposition II.3. Second, MPGNNs enjoy theoretical guarantees in radio resource management problems (as discussed in Section IV). These two properties are unique for MPGNNs and are not enjoyed by SGNNs.
Iii-C Key Properties of MPGNNs
MPGNNs enjoy properties that are favorable to solving large-scale radio resource management problems, as discussed in the sequel.
We first show that MPGNNs satisfy the permutation equivariance property, which leads to easier training and better generalization.
(Permutation equivariance in MPGNNs) Viewing the input output mapping of MPGNNs defined in (7) as , we have
for any permutation matrix .
Please refer to Appendix B for a detailed proof.
Ability to generalize to different problem scales
In MLPs, the input or output size must be the same during training and testing. Hence, the problem size in the test dataset must be equal or less than the problem scale in the training dataset . This means that MLP based methods cannot be directly applied to a different problem size. In MPGNNs, each node has a copy of two sub neural networks, i.e., and , whose input-output dimensions are invariant with the problem scale. Thus, we can train MPGNNs on small-scale problems and apply them to large-scale problems.
Fewer training samples
The required number of training samples for MPGNNs is much smaller than that for MLPs. The first reason is training sample reusing. For each training sample, each node receives a permuted version of it and processes it with and . Thus, each training sample is reused times for training and , where is the problem scale. Second, input and output dimensions of the aggregation and combination functions in MPGNNs are much smaller than the original problem, which allows the use of much fewer parameters in neural networks.
High computational efficiency
In each layer, an aggregation function is applied to all the edges and a combination function is applied to all the nodes. Thus, the time complexity for each layer is and the overall time complexity for an -layer MPGNN is . The time complexity grows linearly with the number of agents when the maximal degree of the graph is bounded. Note that in MPGNNs, the aggregation function and combination function on each node can be executed in parallel. When the MPGNNs are fully parallelized, e.g., on powerful GPUs, the time complexity is , where is the maximal degree of the graph. This is a constant time complexity when the maximal degree of the graph is bounded. We will verify this observation via simulations in Fig. 4.
Iii-D An Effective Implementation of MPGNNs
In this subsection, we propose an effective implementation of MPGNNs for radio resource management problems, named, the wireless channel graph convolution network (WCGCN), which is able to incorporate both agent-related features and channel-related features. The design space for MPGNNs (7) is to choose the set aggregation function and the combination function .
As general set functions are difficult to implement, an efficient implementation of was proposed in , which has the following form
where are the elements in the set, is a simple function, e.g., max or sum, and is some existing neural network architecture, e.g., linear mappings or MLPs. For and , linear mapping is adopted in popular GNN architectures (e.g., GCN  and S2V ). Nevertheless, as discussed in Section IV in , linear mappings have difficulty handling continuous features, which is ubiquitous in wireless networks (e.g., CSI). We adopt MLPs as and for their approximation ability . MLP processing unit enables WCGCN to exploit complicated agent-related features and channel-related features in wireless networks.
For the aggregation function , we notice that the following property holds if we use .
(Robustness to feature corruptions)  Suppose such that and . Then,
Theorem III.1 states that remains the same up to corruptions of the input if all the features in are preserved and only contains a limited number of features, which is smaller than . By specifying it to problems in wireless networks, the output of a layer remains unchanged even when the CSI is heavily corrupted on some links. In other words, it is robust to missing CSI.
We next specify the architecture for the WCGCN, which aligns with traditional optimization algorithms. First, in traditional optimization algorithms, each iteration outputs an updated version of the optimization variables. In the WCGCN, each layer outputs an updated version of the optimization variables. Second, these algorithms are often time-invariant systems, e.g., gradient descent, WMMSE , and FPlinQ . Thus, we share weights among different layers of the WCGCN, and the updates are
where MLP1 and MLP2 are two different MLPs, and is a normalization function that depends on applications. For example, for the power control problem, we constrain the power between and , and
can be a sigmoid function, i.e.,.
Besides the benign properties of MPGNNs, WCGCN enjoys several desirable properties for solving large-scale radio resource management problems. First, the WCGCN can effectively exploit features in multi-antenna systems with heterogeneous agents (e.g., channel states in multi-antenna systems and users’ weights in weighted sum rate maximization). This is because WCGCN adopts MLP as processing units instead of linear mappings. This enables it to solve a wider class of radio resource management tasks than existing works [6, 16, 9] (e.g., beamforming problems and weighted sum rate maximization). Second, it is robust to partial and imperfect CSI as suggested in Theorem III.1.
Iv Theoretical Analysis of MPGNN-based Radio Resource Management
In this section, we investigate performance and generalization of MPGNNs. We first prove the equivalence between MPGNNs and a class of distributed algorithms, which include many classic algorithms for radio resource management as special examples, e.g., WMMSE  and FPlinQ . Based on this observation, we analyze the performance of MPGNN-based methods for weighted sum rate maximization problem.
To provide theoretical guarantees for “learning to optimize” approaches for solving radio resource management problems, it is critical to understand the performance and generalization of neural network-based methods. Unfortunately, the training and generalization of neural networks are sill open problems. We make several commonly adopted simplifications to make the performance analysis tractable. First, we focus on the MPGNN class instead of any specific neural network architecture such as GCNs. Following Lemma 5 and Corollary 6 in 
, we can design an MPGNN with MLP processing units as powerful as the MPGNN class, and thus this simplification well serves our purpose. Second, we target at proving the existence of an MPGNN with performance guarantee. Because we train the neural network with a stochastic gradient descent with limited training samples during the simulations, we may not find the corresponding neural network parameters. While this may leave some gap between the theory and practice, our result is an important first step. These two simplifications have been commonly adopted in the performance analysis of GNNs[37, 23, 1].
Iv-B Equivalence of MPGNNs and Distributed Optimization
Compared with the neural network-based radio resource management, optimization-based radio resource management has been well studied. Thus, it is desirable to make connections between these two types of methods. In , the equivalence between some special types of GNNs and graph optimization algorithms was proved. Inspired by this result, we shall establish the equivalence between MPGNNs and a class of distributed radio resource management algorithms.
We first give a brief introduction to distributed local algorithms, following . The maximal degree of the nodes in the graph is assumed to be bounded. Distributed local algorithms are a class of iterative algorithms in a multi-agent system. In each iteration, each agent sends messages to its neighbors, receives messages from its neighbors, and updates its state based on the received messages. The algorithm terminates after a constant number of iterations.
We focus on a sub-class of distributed local algorithms, titled, multiset broadcasting distributed local algorithms (MB-DLA) , which include a wide range of radio resource management algorithms in wireless networks, e.g., DTP , WMMSE , and FPlinQ . Multiset and broadcasting refer to the way for receiving and sending messages, respectively. Denote as the state of node at the -th iteration, and the MB-DLA is shown in Algorithm 1.
The equivalence between MPGNNs and MB-DLAs roots in the similarity in their definitions. In each iteration of an MB-DLA, each agent aggregates messages from neighbor agents and updates its local state. In each layer of an MPGNN, each node aggregates features from neighbor nodes. The equivalence can be drawn if we view the agents as nodes in a graph and messages as the features. The following proposition states the equivalence of MPGNNs and MB-DLAs formally.
Let MB-DLA() denote the class of MB-DLA with iterations and MPGNN() as the class of MPGNNs with layers, then the following two conclusions hold.
For any MPGNN(), there exists a distributed local algorithm in MB-DLA() that solves the same set of problems as MPGNN().
For any algorithm in MB-DLA(), there exists an MPGNN() that solves the same set of problems as this algorithm.
Please refer to Appendix C for a detailed proof.
The equivalence allows us to analyze the performance of MPGNNs by studying the performance of MB-DLAs. The first result shows that MPGNNs are at most as powerful as MB-DLAs. The implication is that if we can prove that there is no MB-DLA capable of solving a specific radio resource management problem, then MPGNNs cannot solve it. This can be used to prove a performance upper bound of MPGNNs. The second result shows that MPGNNs are as powerful as MB-DLAs in radio resource management problems. This implies that if we are able to identify an MB-DLA that solves a radio resource management problem well, then there exists an MPGNN performs better or at least competitive. The generalization is also as good as the corresponding MB-DLA. We shall give a specific example on sum rate maximization in the next subsection.
Iv-C Performance and Generalization of MPGNNs
In this subsection, we use the tools developed in the last subsection to analyze the performance and generalization of MPGNNs in the sum rate maximization problem. The analysis is built on the observation that a classic algorithm for the sum rate maximization problem, i.e., WMMSE, is an MB-DLA under some conditions, which is formally stated below. We shall refer to the MB-DLA corresponding to WMMSE as WMMSE-DLA.
When the maximal number of interference neighbors is bounded by some constant, then WMMSE with a constant number of iterations is an MB-DLA.
When the problem sizes in the training dataset and test dataset are the same, we can always assume that the number of interference neighbors is a common constant. The restriction of a constant number of interference neighbors only influences the generalization. Please refer to Appendix D for a detailed proof.
shows that WMMSE is an MB-DLA. Thus, when the problem sizes in the training dataset and test dataset are the same, there exists an MPGNN whose performance is as good as WMMSE. As the WMMSE is hand-crafted, it is not optimal in terms of the number of iterations. By employing a unsupervised loss function, we expect that MPGNNs can learn an algorithm which has fewer iterations and may possibly enjoy better performance. In Fig.3, we observe that a -layer MPGNN outperforms WMMSE with iterations and a -layer MPGNN outperforms WMMSE with iterations.
To avoid the excessive training cost, it is desirable to first train a neural network on small-scale problems and then generalize it to large-scale ones. An intriguing question is when such generalization is reliable. Compared with WMMSE, WMMSE-DLA has two constraints: Both the number of iterations and the maximal number of interference neighbors should be bounded by some constants. As agents that are far away cause little interference, the number of interference neighbors can be assumed to be fixed when the user density is kept the same. As a result, compared with WMMSE with a fixed number of iterations, the performance of MPGNNs is stable when the user density in the test dataset is the user density in the training dataset multiplied by a constant. We will verify this by simulations in Table IV and Table VII.
V Simulation Results
In this section, we provide simulation results to verify the effectiveness of the proposed neural network architecture for three applications. The first application is sum rate maximization in a Gaussian interference channel, which is a classic application for deep learning-based methods. We use this application to compare the proposed method with MLP-based methods  and optimization-based methods . The second application is weighted sum rate maximization, and the third application is beamformer design. The last two problems cannot be solved by existing methods in [6, 16, 9].
For the neural network setting, we adopt a
-layer WCGCN, implemented by Pytorch Geometric. We apply unsupervised training without labeled samples, and the loss function is defined as
where the expectation is taken over all the channel realizations. To optimize the neural network, we adopt the adam optimizer with a learning rate .
V-a Sum Rate Maximization
We first consider the sum rate maximization problem in a single-antenna Gaussian interference channel. This problem is a special case of (4) with , , and .
We consider the following benchmarks for comparison.
WMMSE : This is a classic optimization-based algorithm for sum utility maximization in MIMO interfering broadcast channels. We run WMMSE for iterations.
Strongest: We find a fixed proportion of pairs with the largest channel gain , and set the power of these pairs as while the power levels for remaining pairs are set to . This is a simple baseline algorithm without any knowledge of interference links.
PCNet : PCNet is an MLP based method particularly designed for the sum rate maximization problem with single-antenna channels.
We use training samples for WCGCN and training samples for PCNet. For a specific parameter setting of WCGCN (8), we set the hidden units of MLP1 in (8) as , MLP2 as , and as sigmoid function.222The performance of WCGCN is not sensitive to the number of hidden units. The performance of different methods is shown in Table I. The SNR and number of users are kept the same in the training and test dataset. For all the tables shown in this section, the entries are (weighted) the sum rates achieved by different methods normalized by the sum rate of WMMSE. We see that both PCNet and WCGCN achieve near-optimal performance when the problem scale is small. As the problem scale becomes large, the performance of PCNet approaches Strongest. This shows that it can hardly learn any valuable information about interference links. Nevertheless, the performance of WCGCN is stable as the problem size increases. Thus, GNNs are more favorable than MLPs for medium-scale or large-scale problems.
We further compare the performance of WCGCN and WMMSE with different numbers of iterations. We use the system setting , SNRdB and the results are shown in Fig. 3. From the figure, we see that a -layer WCGCN outperforms WMMSE with iterations and a -layer WCGCN outperforms WMMSE with iterations. This indicates that by adopting the unsupervised loss function, WCGCN can learn a much better message-passing algorithm than the handcrafted WMMSE.
V-B Weighted Sum Rate Maximization
In this application, we consider single-antenna transceiver pairs within a area. The transmitters are randomly located in the
area while each receiver is uniformly distributed withinfrom the corresponding transmitter. We adopt the channel model from  and use training samples for each setting. To reduce the CSI training overhead, we assume is available to WCGCN only if the distance between transmitter and receiver is within meters. To provide a performance upper bound, global CSI is assumed to be available to WMMSE. The weights for weighted sum rate maximization, i.e., in (4), are generated from a uniform distribution in in both training and test dataset. For a specific parameter setting of WCGCN (8), we set the hidden units of MLP1 as , MLP2 as , and as sigmoid function.
We first test the performance of WCGCN when the number of pairs is the same in the training and test dataset. Specifically, we consider pairs in a region. We test the performance of WCGCN with different values of and , as shown in Table II. The entries in the table are the sum rates achieved by different methods. We observe that WCGCN with local CSI achieves competitive performance to WMMSE with global CSI.
Next, to test the generalization capability of the proposed method, we train WCGCN on a wireless network with tens of users and test it on wireless networks with hundreds or thousands of users, as shown in the following two simulations.
Generalization to larger scales
We first train the WCGCN with pairs in a region. We then change the number of pairs in the test set while the density of users (i.e., ) is fixed. The results are shown in Table III. It can be observed that the performance is stable as the number of users increases. It also shows that WCGCN can well generalize to larger problem scales, which is consistent with our analysis.
|Number of Links|
|Field length (m)|
|Number of Links|
Generalization to higher densities
In this test, we first train the WCGCN with pairs in a region. We then change the number of pairs in the test set while fixing the area size. The results are shown in Table IV and the performance loss compared with is shown in the bracket. The performance is stable up to a -fold increase in the density, and good performance is achieved even when there is a -fold increase in the density.
V-C Beamformer Design
In this subsection, we consider the beamforming for sum rate maximization in (4). Specifically, we consider transceiver pairs within a area, where the transmitters are equipped with multiple antennas and each receiver is equipped with a single antenna. The transmitters are generated uniformly in the area and the receivers are generated uniformly within from the corresponding transmitters. We adopt the channel model in  and use training samples for each setting. The assumption of the available CSI for WCGCN and WMMSE is the same as the previous subsection. In WCGCN, a complex number is treated as two real numbers. For a specific parameter setting of WCGCN (8), we set the hidden units of MLP1 as , MLP2 as , and .
We first test the performance of WCGCN when the number of pairs in the training dataset and the number of pairs in the test dataset are the same. Specifically, we consider pairs in a meters by meters region and each transmitter is equipped with antennas. We test the performance of WCGCN with different and . The results are shown in Table V. We observe that WCGCN achieves comparable performance to WMMSE with local CSI, demonstrating the applicability of the proposed method to multi-antenna systems.
Generalization to larger scales
We first train the WCGCN with pairs in a meters by meters region with . We then change the number of pairs while the density of users (i.e., ) is fixed. The results are shown in Table VI. The performance is stable as the number of users increases, which is consistent with our theoretical analysis.
|Number of Links|
Generalization to larger densities
We first train the WCGCN with pairs on a meters by meters region with . We then change the number of pairs while fix the area size. The results are shown in Table VII and the performance loss is shown in the bracket. The performance is stable up to a -fold increase in the density and satisfactory performance is achieved up to a -fold increase in the density. The performance deteriorates when the density grows, which indicates that extra training is needed when the density in the test dataset is much larger than that of the training dataset.
|Number of Links|
Computation time comparison
This test compares the running time of different methods for different problem scales. We run “WCGCN GPU” on GeForce GTX 1080Ti while the other methods on Intel(R) Xeon(R) CPU E5-2643 v4 @ 3.40GHz. The implementation of neural networks exploits the parallel computation of GPU while WMMSE is not able to do so due to its sequential computation flows. The running time is averaged over problem instances and shown in Fig. 4. The speedup compared with WMMSE becomes large as the problem scale increases. This benefits from the low computational complexity of WCGCN. As shown in the figure, the computational complexity of WCGCN CPU is linear and WCGCN GPU is nearly a constant, which is consistent with our analysis in Section III-C. Remarkably, WCGCN is able to solve the problem with users within milliseconds.
In this paper, we developed a scalable neural network architecture based on GNNs to solve radio resource management problems. In contrast to existing learning based methods, we focused on the neural architecture design to meet the key performance requirements, including low training cost, high computational efficiency, and good generalization. Moreover, we theoretically connected learning based methods and optimization based methods, which casts light on the performance guarantee of learning to optimize approaches. We believe that this investigation will lead to profound implications in both theoretical and practical aspects. As for future directions, it will be interesting to investigate the distributed deployment of MPGNNs for radio resource management in wireless networks, and extend our theoretical results to more general application scenarios.
Appendix A Proof of Proposition ii.2
Following Proposition II.1, we have
for any variable , adjacency matrix , and permutation matrix .
Appendix B Proof of Proposition iii.1
In the original graph, denote the input feature of node as , the edge feature of edge as , and the output of the -th layer of node as . In the permuted graph, denote the input feature of node as , the edge feature of edge as , and the output of the -th layer for node as . Due to the permutation relationship, we have
We prove the result by induction. First, we have as in (11). Assume . In the -th layer, the following update rule is applied
Appendix C Proof of Theorem iv.1
In MB-DLAs, the maximal degree of nodes should be bounded by some constant, denoted by . The update of MB-DLA at the -th iteration can be written as
The update of an MPGNN at the -layer can be written as
1) We first show that the inference stage of an MPGNN can be viewed as an MB-DLA, which is proved by induction. Before the algorithm and neural network start, both and are node features and thus . We assume . At the -th iteration, the message is passed from agent to agent . Then agent updates its local state as and . By doing so, we have .
2) We show that (13) can be written in the form of (14). Before the algorithm and neural network start, both and are node features and thus . We assume . Let . At the -th iteration, node aggregates features from neighbor nodes that form a multiset . We order the elements in according to their first coordinates. Let denote the function that selects the -th element in a multiset , , . Taking
and , we then obtain . This completes the proof.
Appendix D Proof of Proposition iv.1
WMMSE  is a classic algorithm for weighted sum rate maximization in MIMO interfering broadcast channels. The WMMSE algorithm considers a cell interfering broadcast channel where base station (BS) serves users. Denote as the channel from base station to user , as the beamformer that BS uses to transmit symbols to user , as the weight of user , and
as the variance of noise for user. The problem formulation is
The WMMSE algorithm is shown in Algorithm 2.
We first model this system as a graph. We treat the -th user as the -th node in the graph. The node features are . The internal state of node at the -th iteration is . An edge is drawn from the -th node to the -th node if there is an interference link between the -th BS and the -th user. The edge feature of the edge is .
We show that a WMMSE algorithm with iterations is an MB-DLA with at most iterations. We update the variables and
at the odd iterations while updating the variableat the even iterations. Specifically, at the -th iteration with being an odd number, the -th node broadcasts its state along its edges. The edge processes the message by forming and the node receives the message set . The agent first sums over the messages . Then the -th node updates its internal state as