Black-Box and Modular Meta-Learning for Power Control via Random Edge Graph Neural Networks

by   Ivana Nikoloska, et al.

In this paper, we consider the problem of power control for a wireless network with an arbitrarily time-varying topology, including the possible addition or removal of nodes. A data-driven design methodology that leverages graph neural networks (GNNs) is adopted in order to efficiently parametrize the power control policy mapping the channel state information (CSI) to transmit powers. The specific GNN architecture, known as random edge GNN (REGNN), defines a non-linear graph convolutional filter whose spatial weights are tied to the channel coefficients. While prior work assumed a joint training approach whereby the REGNN-based policy is shared across all topologies, this paper targets adaptation of the power control policy based on limited CSI data regarding the current topology. To this end, we propose both black-box and modular meta-learning techniques. Black-box meta-learning optimizes a general-purpose adaptation procedure via (stochastic) gradient descent, while modular meta-learning finds a set of reusable modules that can form components of a solution for any new network topology. Numerical results validate the benefits of meta-learning for power control problems over joint training schemes, and demonstrate the advantages of modular meta-learning when data availability is extremely limited.



There are no comments yet.



Fast Power Control Adaptation via Meta-Learning for Random Edge Graph Neural Networks

Power control in decentralized wireless networks poses a complex stochas...

Modular meta-learning in abstract graph networks for combinatorial generalization

Modular meta-learning is a new framework that generalizes to unseen data...

Meta Learning Black-Box Population-Based Optimizers

The no free lunch theorem states that no model is better suited to every...

Meta-Learning for Black-box Optimization

Recently, neural networks trained as optimizers under the "learning to l...

Meta-Learning to Communicate: Fast End-to-End Training for Fading Channels

When a channel model is available, learning how to communicate on fading...

Few-Shot Graph Classification with Model Agnostic Meta-Learning

Graph classification aims to perform accurate information extraction and...

Can one hear the shape of a neural network?: Snooping the GPU via Magnetic Side Channel

Neural network applications have become popular in both enterprise and p...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

I-a Motivation

With the proliferation of wireless devices and services, wireless communication networks are becoming increasingly complex. Beyond 5G (B5G) networks are expected to provide uninterrupted connectivity to devices ranging from sensors and cell phones to vehicles and robots, calling for the development of novel interference management strategies via radio resource management (RRM). However, solving most RRM problems is NP-hard, making it challenging to derive an optimal solution in all but the simplest scenarios [mollanoori2013uplink].

Solutions to this problem run the gamut from classical optimization techniques [lei2015joint]

to information and game theory

[yang2017mean, riaz2018power]

. As emerging applications demand growth in scale and complexity, modern machine learning techniques have also been explored as alternatives to solve RRM problems in the presence of model and/or algorithmic deficits

[simeone2018very]. The performance of trained models generally depend on how representative the training data are for the channel conditions encountered at deployment time. As a result, when conditions in the network change, these rigid models are often no longer useful [nair2019covariate], [quinonero2009dataset].

A fundamental RRM problem is the optimization of transmission power levels at distributed links that share the same spectral resources in the presence of time-varying channel conditions [chiang2008power]. This problem was addressed by the data-driven methodology introduced in [eisen2020optimal], and later studied in [eisen2020transferable, naderializadeh2020wireless, chowdhury2020unfolding]

. In it, the power control policy mapping channel state information (CSI) and power vector is parametrized by a graph neural network (GNN). The GNN encodes information about the network topology through its underlying graph whose edge weights are tied to the channel realizations. The design problem consists of training the weights of the graph filters, while tying the spatial weights applied by the GNN to the CSI. As a result, the solution – which is referred to as random edge GNN (REGNN) – automatically adapts to time-varying CSI conditions.

In this paper, we focus on the higher-level problem of facilitating adaptation to time-varying topologies. To this end, as illustrated in Fig. 1, we assume that the topology of the network varies across periods of operation of the system, with each period being characterized by time-varying channel conditions as in [eisen2020optimal]. As such, the operation within each channel period is well reflected by the model studied in [eisen2020optimal, naderializadeh2020wireless], and we adopt an REGNN architecture for within-period adaptation. At the beginning of each period, the network designer is given limited CSI data that can be used to adapt the REGNN-based power control policy to the changed topology. In order to facilitate fast adaptation – in terms of data and iteration requirements – we integrate meta-learning with REGNN training.

I-B Meta-learning

The goal of meta-learning is to extract shared knowledge, in the form of an inductive bias, from data sets corresponding to distinct learning tasks in order to solve held-out tasks more efficiently [schmidhuber1987evolutionary, thrun1998lifelong]. The inductive bias may refer to parameters of a general-purpose learning procedure, such as the learning rate [maclaurin2015gradient], or initialization [finn2017model], [nichol2018firstorder], [grant2018recasting] of (stochastic) gradient descent (S)GD. These schemes can be credited for much of the reinvigorated interest in meta-learning in the previous decade. We will refer to them as black-box meta-learning methods, given their model-agnostic applicability via fast parametric generalization.

In contrast, modular meta-learning aims at fast combinatorial generalization [chomsky2014aspects], making, in a sense, "infinite use of finite means" [von1999humboldt]. Modular meta-learning generalizes to new tasks by optimizing a set of neural network modules that can be composed in different ways to solve a new task, without changing their internal parameters [alet2018modular], [alet2019neural]. Modularity is a key property of engineered systems, due to its fault tolerance, interpretability, and flexibility [baldwin2006modularity], but is generally lacking in data-driven solutions, which often amount to large black-box input-output mappings. The few existing modular meta-learning approaches rely on simulated annealing to find a suitable module composition for each task given the current neural network modules [alet2018modular]. These, however, are notoriously inefficient optimization methods (in terms of computation time), and more recent techniques integrate learnt proposal functions in order to speed up training [alet2019neural].

I-C Contributions

As illustrated in Fig. 1, the main goal of this paper is to optimize fast adaptation procedures for the power control policy to time-varying network configurations. To do so, we consider both black-box and modular meta-learning methods. In particular, the contributions of this paper can be summarized as follows:

  • We integrate first-order model agnostic meta-learning (FOMAML) [finn2017model], as a state-of-the-art representative of black-box meta-learning methods, with REGNN training;

  • We introduce a novel modular meta-learning method that constructs a repository of fixed graph filters that can be combined to define REGNN-based power control models for new network configurations. In contrast to existing modular meta-learning schemes that rely on variants of simulated annealing, the proposed method adopts a stochastic module assignment based on the Gumbel-softmax reparametrization trick [maddison2016concrete], which enables optimization via standard SGD;

  • We validate the performance of all meta-learning methods with extensive experiments that provide comparisons with joint training schemes [eisen2020optimal]. The use of meta-learning for power control problems in wireless networks is validated, and a comparative study of the performance of the considered meta-learning solutions is presented.

I-D Prior Work

GNNs are enjoying an increasing popularity in the wireless communication community. In addition to power allocation [eisen2020optimal, eisen2020transferable, naderializadeh2020wireless, chowdhury2020unfolding], GNNs have been used to address cellular [zhao2020cellular] and satellite [yang2020noval] traffic prediction, link scheduling [lee2020graph], channel control [tekbiyik2020channel], and localization [yan2021graph]. Due to their localized nature, GNNs have also been applied to cooperative [dong2020drl] and decentralized [lee2021decentralized] control problems in networked systems. A review of the use of GNNs in wireless communication can be found in [he2021overview].

Meta-learning has been shown to improve the training and adaptation efficiency in various problems in wireless communications, ranging from demodulation [park2020learning] and decoding [jiang2019mind]

, to channel estimation

[mao2019roemnet] and beamforming [yuan2020transfer]. In particular, in [park2020learning] the authors use pilots from previous transmissions of Internet of Things (IoT) devices in order to adapt a demodulator to new channel conditions using few pilot symbols. The authors of [mao2019roemnet] train a neural network-based channel estimator for orthogonal frequency-division multiplexing (OFDM) system with FOMAML in order to obtain an effective solution given a small number of samples. Reference [yuan2020transfer] studies fast beamforming in multiuser multiple-input single-output (MISO) downlink systems. An overview of meta-learning methods, with applications to wireless communication networks is available in [simeone2020learning].

The application of meta-learning to GNN-based power control was presented in the conference version of this paper for the first time [nikoloska2021fast]. In particular, [nikoloska2021fast] considers black-box methods and offers preliminary experimental results. In contrast to the preliminary conference version [nikoloska2021fast], in this paper, we consider both black-box and modular meta-learning solutions, and we provide a more comprehensive numerical evaluation of all considered meta-learning schemes. To the best of the authors’ knowledge, this is the first work investigating the use of modular meta-learning in communication engineering problems.

The rest of the paper is organized as follows. The considered model and problem are presented in Section II, and REGNNs are reviewed in Section III. Meta-learning is introduced in Section IV, and black-box methods and the proposed modular solution are given in Section V and Section VI, respectively. All meta-learning schemes are evaluated in Section VII. Section VIII concludes the paper.

Ii Model and Problem

Figure 1: Interference graph over periods , , and . Each vertex represents a communication link, and an edge is included between interfering links.

As illustrated in Fig. 1, we consider a wireless network running over periods , with topology possibly changing at each period . During period , the network is comprised of communication links. Transmissions on the links are assumed to occur at the same time using the same frequency band. The resulting interference graph includes an edge for any pair of links with whose transmissions interfere with one another. We denote by the subset of links that interfere with link at period . Both the number of links and the topology defined by the edge set generally vary across periods .

Each period contains time slots, indexed by . In time slot of period , the channel between the transmitter of link and its intended receiver is denoted by , while denotes the channel between transmitter of link and receiver of link with . Channels account for both slow and fast fading effects, and, by definition of the interference graph , we have for . The channels for slot in period are arranged in the channel matrix , with the entry given by . Channel states vary across time slots, and the marginal distribution of matrix for all is constant and denoted by . The distribution generally changes across periods , and it is a priori unknown to the network.

To manage inter-link interference, it is useful to adjust the transmit powers such that a global network-wide objective function is optimized (see, e.g., [douros2011review]). For each channel realization , we denote the vector of power allocation variables as , whose -th component, , represents the per-symbol transmit power of transmitter at time slot of period . The resulting achievable rate in bits per channel use for link is given by


where denotes the per-symbol noise power. By (1), interference is treated as worst-case additive Gaussian noise.

The goal of the system is to determine a power allocation policy in each period that maps the channel matrix to a power allocation vector as


by maximizing the average achievable sum-rate. This yields the stochastic optimization problem


where denotes the power constraint of link . Note that problem (II) is defined separately for each period . Since the distribution is unknown, problem (II) can not be addressed directly.

We assume, however, that the designer has access to channel realizations over time slots in period . Accordingly, problem (II) can be approximated by estimating the objective in (II) via an empirical average as in

Figure 2: A graph convolutional filter is a polynomial on a matrix representation of the interference graph. As the order of the filter grows (here ), information is aggregated from nodes that are farther apart (shown in green for the output corresponding to the node in red.)

Iii Power Allocation by Training REGNN

In this section, we review the solution proposed in [eisen2020optimal], which tackles problem (II) separately for each period . The approach in [eisen2020optimal] parametrizes the power allocation function in (2) by a REGNN as


where is a vector of trainable parameters. In the rest of this section, we first describe the mapping implemented by a REGNN, and then we review the problem of optimizing the parameter vector . Unless stated otherwise, in this section we drop the index , which is fixed in order to simplify notation.

Iii-a REGNN Model

To introduce the REGNN model, let us first describe the key operation of graph filtering. Consider a graph , with nodes in set and edge set . We associate to graph a matrix , known as the graph shift operator (GSO), with the property that we have for . Note that the channel matrix satisfies this condition for the interference graph. A graph signal is a vector , with each entry being assigned to one of the nodes in the graph. Given a vector of filter taps with , a graph filter applies the graph convolution [sandryhaila2013discrete]


to a input graph signal . The filter is a polynomial of the matrix .

As illustrated in Fig. 2, each -th power of of the GSO (6) performs an -hop shift of the elements in vector on the graph. Specifically, the term is a vector whose -th entry aggregates the entries in vector corresponding to single-hop neighbouring nodes of node , each weighted by the corresponding channel element of the GSO; the term aggregates for each node the contributions in vector associated to two-hop neighbouring nodes; and so on. As illustrated in Fig. 2, as the order increases, node inputs from larger neighborhoods are incorporated. Thus, the graph convolution implements a local message-passing procedure, with information from larger neighbourhoods being aggregated as the filter size in (6) increases.

An REGNN consists of a layered architecture in which each layer is a composition of a graph convolution and a per-node non-linearity. The graph convolution in each layer uses the current channel matrix as the GSO in (6). Due to its dependence on the random fading channels, the graph convolution is characterized by "random edges" according to the terminology used in [eisen2020optimal]. Given the current channel matrix , the output of each -th intermediate layer is given as



denotes a non-linear function, such as a rectified linear unit (ReLU) or a sigmoid, that is applied separately to each of the

entries in the input. The REGNN is defined by the recursive application of (7) for layers, with input to the first layer given by the input graph signal . In this paper, the input signal is set to an all-one vector [naderializadeh2020wireless], but it may more generally include a variable describing the state of each link [eisen2020optimal].

The transmit power in (5) is found as the output of the final, -th layer of the REGNN as


with being a diagonal matrix with its -th element on the main diagonal being given by , and , denoting the model parameters (convolution taps) for layer . By (8), specifying the REGNN architecture requires defining the number of layers and the number of filter taps per layer. Assuming all layers have an equal number of taps, the total number of trainable parameters is thus , a number considerably smaller than what would be required to train a fully-connected neural network.

Iii-B Training a REGNN

Given a set of channel realizations for a given period , training of the REGNN parameters

is done by tackling the unsupervised learning problem



via (S)GD. Note that problem (9) restricts the optimization in (II) to the class of REGNNs in (8). By incorporating the channel matrices in the structure of the REGNN-based power control policy , the method proposed in [eisen2020optimal] automatically adapts to the different per-slot channel realizations.

Iv Meta-learning Power Control

Our main goal in this paper is to improve the data efficiency of the REGNN solution reviewed in the previous section by enabling the explicit adaptation of the power control policy to the interference graph of each period , and hence across the changing topologies (see Fig. 1). To this end, we propose to transfer knowledge across a number of previously observed topologies in the form of an adaptation procedure for the power control policy. This is done by meta-learning.

In order to enable meta-learning, we assume the availability of channel information from previous periods. We denote the meta-training data set as , with being the channel matrices available for each period . Following standard practice in meta-learning, each meta-training data set is split into training data and testing data [finn2017model], [simeone2020learning], and we write and to denote the indices of the slots assigned to each set. At test time, during deployment, the network observes a new topology for which it has access to a data set , which is generally small, to optimize the power allocation strategy.

The idea underlining meta-learning is to leverage the historical data in order to optimize a learning algorithm that uses training data to obtain a well performing REGNN parameter vector for any new period , even when the training data set is of limited size. In practice, the training algorithm is either explicitly or implicitly defined by the solution of the learning problem (9) using the training data . The meta-training objective is represented as the optimization problem


where the testing part of the per-period data set

is used to obtain an unbiased estimate of the sum-rate in (


In the next two sections, we propose two approaches to formulate and solve the meta-learning problem (10). First, we adopt black-box meta-learning strategies that are based on a model-agnostic optimization approach [finn2017model],[nichol2018firstorder]. Then, we introduce a novel modular meta-learning method, which aims at discovering common structural elements for the power allocation strategies across different interference graphs.

Figure 3: (Top) Black-box strategies such as FOMAML optimize a representation that can be quickly adjusted to solve a new task. (Bottom) Modular meta-learning methods optimize a repertoire of modules that can be quickly recombined at runtime to solve a new task.

V Black-box Meta-learning

Black-box meta-learning addresses the meta-learning problem (10) by adopting a general-purpose optimizer for the per-period learning problem (9) as the adaptation procedure . Specifically, we adopt model agnostic meta-learning (MAML), a state-of-the-art meta-learning technique whose key idea is parametrizing the algorithm with an initialization vector used to tackle the inner problem (9) via SGD. In this section, we first develop MAML, as well as its simplified version, FOMAML, for power allocation via REGNNs. Then, we observe that black-box meta-learning does not affect the permutation equivariance of REGNNs highlighted in [eisen2020optimal].


MAML and FOMAML parametrize the adaptation algorithm with the initialization vector . Accordingly, assuming for simplicity a single step of gradient descent for problem (9), we have the training algorithm


where denotes the learning rate and we have made explicit the dependence on the initialization in the notation . The update (11) can be directly generalized to include multiple GD steps, as well as a reduced size of the mini-batch to implement SGD. Furthermore, the same update, and generalization thereof, apply also to the meta-test period , yielding the model parameters .

With definition (11) of the training algorithm, MAML addresses the optimization problem (10), which is restated as the maximization


over the initialization .

For the single GD update in (11), the meta-training problem in (12) is addressed by MAML using GD, which updates the initialization in the outer loop as



denotes the identity matrix and

denotes the learning rate. Extensions to SGD are straightforward.

The MAML update in (V-A), requires computation of the Hessian of the REGNN mapping (8) with respect of the model parameters, which can be expensive. First-order methods, such as FOMAML [finn2017model], aim at circumventing the need for computation of higher-order derivatives. In particular, FOMAML ignores the Hessian terms in the updates of the shared parameters in (V-A), obtaining the update


Algorithm 1 provides a summary of FOMAML for power allocation. The algorithm has a nested loop structure, with the outer loop updating the shared initialization parameters and the inner loop carrying out the local model updates in (11).

V-B Permutation Equivariance and Invariance

An important property of REGNNs is their equivariance to permutations [sandryhaila2013discrete]. In the context of wireless networks, this implies that a relabelling or reordering of the transmitters in the network produces the same permutation of the power allocation obtained via the REGNN (8). In this subsection, we briefly review this important property, and observe that the solution provided by black-box meta-learning is also permutation invariant.

Formally, let denote a permutation matrix such that the product reorders the entries of any given vector , and the product reorders the rows and columns of any given matrix . The output of the REGNN is permutation equivariant in the sense that, for a permutation matrix and channel matrix , we have


This structural property is not satisfied by general fully-connected models, in which a restructuring of the network would require an equivalent permutation of the weights. The equivariance of the optimal power control policy encodes the structure of the problem, as the labeling of the nodes is generally arbitrary.

By (15), the meta-learning objective in (12) is permutation invariant in the sense that, for a permutation matrix and any realizations of the channel matrices , we have


where . As a consequence of the invariance of the objective in (16), the initialization produced by MAML in (V-A) is also invariant to permutations.

1:procedure Offline meta-training
2:     Initialize filter taps
3:     for  do
4:         Select meta-training periods
5:         for  do
6:              Update filter taps using data set according to (11)          
7:         Update shared initialization using data set according to (14)       return shared initialization
8:procedure Adaptation at runtime
9:     for  do
10:         Update filter taps using data set according to (11)      
Algorithm 1 Power allocation via black-box meta-learning (FOMAML)

Vi Modular Meta-Learning

The black-box meta-learning method described in the previous section aims at fast parametric generalization, sharing an initialization of the model parameters across periods. In this section, we propose a modular approach that aims at combinatorial generalization, finding a set of reusable modules that can form components of a solution for a new period. The distinction between the two approaches is illustrated in Fig 3. As seen in the figure, in modular meta-learning, the adaptation algorithm selects the filters to be applied at each layer of the REGNN (8) from a shared module set , representing a repository of filter taps. The key idea is that the module set is optimized during meta-training, while it is fixed at runtime, enabling an efficient adaptation based on limited data via the selection of modules from .

Vi-a Modular Meta-learning

A module assignment is a mapping between the layers of the REGNN and the modules from the module set . Mathematically, the assignment is an -dimensional vector, with the -th element indicating the module assigned to layer at period . Thereby, the assignment vector can take

possible values. Let us represent the categorical variable

using a one-hot representation , in which if , and otherwise. With this definition, we can write the output (7) of layer of the modular REGNN as


Using a recursive application of (17), for a given module set and module assignment vector , the transmit power can be found as the output of the modular REGNN as


The objective during meta-training is to optimize a module set that allows the system to find a combination of effective modules for any new topology during deployment. This is done by formulating problem (10) as the maximization


over the module set , where the learning algorithm selects the best possible assignment from set given CSI data . Accordingly, the training algorithm is given as a function of the module set as


where the optimized assignment vector is


Vi-B Determining the Module Assignment

The optimization (19) is a mixed continuous-discrete problem over the module set and the assignment variables . To address this challenging problem, we define a stochastic module assignment function given by the conditional distribution

. This distribution assigns probabilities to each one of the

possible assignment vectors , given the module set and training data for the current period . We can now redefine the bi-level optimization problem in (19) as


where the inner optimization is over the distributions . Problems (22) and (19) are equivalent in the sense that they have the same solution. This is because the optimal distributions concentrate at the optimal module assignment vector (21). As detailed next, we propose to leverage the reparametrization trick to tackle the stochastic optimization in (22) via SGD.

To start, we model the module assignment distribution by using a mean-field factorization across the layers of the REGNN, i.e.,


where is the -th entry of the vector . This does not affect the equivalence of problems (19) and (22) since the deterministic solution given by (21) can be realized by (23). Then, we let

, be the vector of logits that parametrize the assignment probabilities through the softmax function as


The Gumbel-Max trick [gumbel1954statistical], [hazan2012partition], [maddison2014sampling] provides a simple and efficient way to draw a sample from a categorical distribution with logits as


where denotes the indicator function which equals one if the assignment is true, and zero otherwise; and represent independent Gumbel variables obtained as



being independent uniform random variables, i.e.,

. Thereby, using the Gumbel-Max trick (25

), the sampling of a discrete random variable is reduced to applying a deterministic function of the parameters

to noise variables drawn from a fixed distribution.

The argmax operation in (25) is not differentiable, making the optimization of the parameter vectors via SGD infeasible. To address this issue, references [maddison2016concrete], [jang2016categorical] adopt the softmax function as a continuous, differentiable approximation. Samples from the resulting concrete distribution can be drawn according to


where the variables are drawn according to (26). The temperature parameter controls the extent to which random variable resembles the one-hot representation (25): As the temperature tends to zero, the sample becomes identical to .

Regardless of the value of the temperature, substituting the distribution with the distribution in (22) allows us to address the inner optimization problems in (22) over the assignment probabilities. To this end, the objective in (22) is estimated by drawing samples from (26) and plugging (27) into the objective function in (22). As a result, we obtain a differentiable function with respect to the parameters , which can now be optimized via SGD.

To elaborate, consider for simplicity a single sample of the Gumbel random variables in (26). For a fixed set , the inner optimization problem in (22) can be written as


where we have defined


The gradient of (VI-B) with respect to can be easily calculated to carry out the updates of the inner problem in (22). For later reference, a single step of gradient descent, given the current module set yields the update


where denotes the learning rate.

Tackling the outer optimization problem in (22) is more challenging. Specifically, the optimal parameters of the assignment distribution, e.g. (30), are a function of the module set, and hence updating set also requires the partial derivative with respect to the module parameters of the optimized for the inner maximization in (22). However, in a manner similar to FOMAML (and other first-order black-box methods such as [nichol2018firstorder]), we ignore the higher-order derivatives and update the parameters in the module set as


where denotes the learning rate and the gradient with respect to the module parameters is computed at the previous iterate . Using (30) and (31), we can address (22) by iterating over optimizing the assignment probability given the current module set, and optimizing the module parameters given the optimized assignment probability.

Vi-C Optimization During Runtime

During meta-testing, we consider the obtained module set as fixed. Using the training portion of the meta-test data set , we only optimize the parameters of the distribution using (30), or, more practically, multiple gradient descent steps. The final REGNN is constructed by using the mode of the assignment distribution as


yielding the REGNN


Modular meta-learning is summarized in Algorithm 2.

Vi-D Permutation Equivariance and Invariance

The modular nature of the REGNN in (VI-A) does not violate the invariance properties of the individual filters, and of the module set by extension. To elaborate, observe that a single element in the assignment is non-zero, and, as a result, the output of the individual layers (17) is equivalent to (7), whose equivariance properties have been established in [eisen2020optimal]. Therefore, the composition in (VI-A) is also equivariant, as in (15), and the objective in (19) is invariant to permutation for any realization of the channel matrix as in (16). We conclude that the optimal module set is invariant to permutations. In other words, any relabelling of the transmitters in the network will produce the same permutation of the power allocation without any modification of the taps in the module set.

1:procedure Offline meta-training
2:     Initialize module set
3:     for  do
4:         Select meta-training periods
5:         for  do
6:              Update the distribution parameters using data set according to (30)          
7:         for  do
8:              Update module parameters using data set using (31)                 return module set
9:procedure Adaptation at runtime
10:     for  do
11:         Update the distribution parameters using data set according to (30)      
12:     Select module assignment using (32)
Algorithm 2 Power allocation via modular meta-learning

Vii Experiments

In this section, we provide numerical results to elaborate on the advantages of black-box and modular meta-learning for power control in distributed wireless networks.

Vii-a Network and Channel Model

As in [eisen2020optimal], a random geometric graph in two dimensions comprised of nodes is drawn in each period by dropping each transmitter uniformly at random at location , with its paired receiver at location . Given the geometric placement, the fading channel state between transmitter and receiver is given by


where the subscript p denotes the path-loss gain which is invariant during a period , and the subscript f denotes the fast-fading component, which depend on the time slot . The constant path-loss gain is given as , where the path-loss exponent is set to . The fast fading component is random, and is drawn i.i.d. over indices and according to . Thereby, at each time slot , fading conditions change, and the instantaneous channel information is used by the model to generate the optimal power allocation. The noise power is set to dBm, and the maximum transmit power is set to dBm for all devices. The corresponding maximum average SINR over the topology generation is


where in is a uniform random variable, i.e., , and follows from applying the Cavalieri’s quadrature formula. The large SNR implies that the system operates in the interference-limited regime, justifying the need for optimized power control policies. All details for the network and channel model are summarized in Table I in Appendix B.

Vii-B Model Architecture and Hyperparameters

As in [naderializadeh2020wireless], we consider a REGNN comprised of hidden layers, each containing a filter of size . The non-linearity in (7) and (8) is a ReLU, given by , except for the output layer where we use a sigmoid. Unless stated otherwise, the number of modules is set to . In all experiments we set the input signal to an all-one vector. We define an annealing schedule for the temperature in (27

) over epochs, whereby the temperature is decreased in every epoch by

, until it reaches a predetermined minimal value, set to [jang2016categorical]. All model hyper-parameters are summarized in Table II in Appendix B.

Vii-C Data sets

We study the case in which where the number of nodes in the network, , is fixed, but the topology changes across periods; as well as the case in which the number of nodes in the network is also time-varying.

Vii-C1 Fixed network size

In the first scenario, for a fixed number of links , each meta-training data set corresponds to a realization of the random drop of the transmitter-receiver pairs at period . Each drop is then run for slots, whereby the fading coefficients are sampled i.i.d. at each slot.

Vii-C2 Dynamic network size

In the second scenario, the size of the network is chosen uniformly at random as . Each meta-training data set corresponds to a realization of the network size and to a random drop of the transmitter-receiver pairs as discussed above.

In both scenarios, unless stated otherwise, we set the number of meta-training periods to , and the training and the testing portions of the data set contain slots each. The meta-learning hyper-parameters are summarized in Table III in Appendix B.

Vii-D Schemes and Benchmarks

We compare the performance of the following schemes:

Vii-D1 Joint learning [eisen2020optimal]

Adopted in [eisen2020optimal], joint learning pools together all tasks in the data set , for all periods in order to address problem (9) with an additional outer sum over periods . The model parameters are then fine-tuned at runtime using the samples in the data set .

Vii-D2 Black-box meta-learning (Black-box ML)

As a representative black-box meta-learning method, we investigate the performance of FOMAML, as detailed in Algorithm 1. The number of gradient descent updates for both the task-specific and the shared parameters is set to .

Vii-D3 Modular meta-learning (Modular ML)

We consider the proposed modular meta-learning method, as detailed in Algorithm 2. The number of gradient descent updates for the assignment and the module parameters are set to and , respectively.

Vii-E Results

Vii-E1 Runtime adaptation speed

To start, we evaluate the requirements in terms of the number of samples in the data set for the new, meta-test topology at runtime by plotting the sum-rate as a function of the size of the data set in Fig. 4. We consider the more challenging case of networks with dynamic size. Fig. 4 confirms that meta-learning can adapt quickly to a new topology, using a much reduced number of samples, as compared to joint learning [eisen2020optimal]. This validates the application of meta-learning to challenging communication problems like power control. Furthermore, modular meta-learning with both and modules is observed to outperform black-box methods when few adaptation samples are available, with the caviat that, a single adaptation sample is insufficient to determine a suitable module assignment when the number of modules is sufficiently large (here ). This points to the benefits of a stronger (meta-)inductive bias in the regime where data availability is very limited. In particular, in modular ML, the adaptation samples are only used determine the module assignment at runtime and not to optimize the module parameters. As the number of samples for adaptation increases, the number of required modules grows and eventually black-box ML becomes advantageous. Overall, the results in Fig. 4 reveal a tension between the sample efficiency of modular ML and the flexibility of black-box methods.

Figure 4: Achievable sum-rate as a function of the number of samples used for adaptation. The number of training and testing samples for each task are set to