Power spectral density (PSD) cartography relies on sensor measurements to map the radiofrequency (RF) signal power distribution over a geographical region. The resulting maps are instrumental for various wireless network management tasks, such as power control, interference mitigation, and planning [21, 12]. For instance, PSD maps benefit wireless network planning by revealing the location of crowded regions and areas of weak coverage, thereby suggesting where new base stations should be deployed. Because they characterize how RF power distributes per channel across space, PSD maps are also useful to increase frequency reuse and mitigate interference in cellular systems. In addition, PSD maps enable opportunistic transmission in cognitive radio systems by unveiling underutilized “white spaces” in time, space, and frequency [5, 42]. Different from conventional spectrum sensing techniques, which assume a common spectrum occupancy over the entire sensed region [28, 3, 25], PSD cartography accounts for variability across space and therefore enables a more aggressive spatial reuse.
A number of interpolation and regression techniques have been applied to construct RF power maps from power measurements. Examples include kriging, compressive sampling , dictionary learning [23, 22], sparse Bayesian learning , and matrix completion . These maps describe how power distributes over space but not over frequency. To accommodate variability along the frequency dimension as well, a basis expansion model was introduced in [6, 13, 8] to create PSD maps from periodograms. To alleviate the high power consumption and bandwidth needs that stem from obtaining and communicating these periodograms,  proposed a low-overhead sensing scheme based on single-bit data along the lines of . However, this scheme assumes that the PSD is constant across space.
To summarize, existing spectrum cartography approaches either construct power maps from power measurements, or, PSD maps from PSD measurements. In contrast, the main contribution of this paper is to present algorithms capable of estimating PSD maps from power measurements, thus attaining a more efficient extraction of the information contained in the observations than existing methods. Therefore, the proposed approach enables the estimation of the RF power distribution over frequency and space using low-cost low-power sensors since only power measurements are required.
To facilitate practical implementations with sensor networks, where the communication bandwidth is limited, overhead is reduced by adopting two measures. First, sensor measurements are quantized to a small number of bits. Second, the available prior information is efficiently captured in both frequency and spatial domains, thus affording strong quantization while minimally sacrificing the quality of map estimates.
Specifically, a great deal of frequency-domain prior information about impinging communication waveforms can be collected from spectrum regulations and standards, which specify bandwidth, carrier frequencies, transmission masks, roll-off factors, number of sub-carriers, and so forth [38, 31]. To exploit this information, a basis expansion model is adopted, which allows the estimation of the power of each sub-channel and background noise as a byproduct. The resulting estimates can be employed to construct signal-to-noise ratio (SNR) maps, which reveal weak coverage areas, and alleviate the well-known noise uncertainty problem in cognitive radio .
To incorporate varying degrees of prior information in the spatial domain, nonparametric and semiparametric estimators are developed within the framework of kernel-based learning for vector-valued functions . Nonparametric estimates are well suited for scenarios where no prior knowledge about the propagation environment is available due to their ability to approximate any spatial field with arbitrarily high accuracy . In many cases, however, one may approximately know the transmitter locations, the path loss exponent, or even the locations of obstacles or reflectors. The proposed semiparametric estimators capture these forms of prior information through a basis expansion in the spatial domain.
Although the proposed estimators can be efficiently implemented in batch mode, limited computational resources may hinder real-time operation if an extensive set of measurements is to be processed. This issue is mitigated here through an online nonparametric estimation algorithm based on stochastic gradient descent. Remarkably, the proposed algorithm can also be applied to (vector-valued) function estimation setups besides spectrum cartography.
The present paper also contains two theoretical contributions to machine learning. First, a neat connection is established between robust (possibly vector-valued) function estimation from quantized measurements and support vector machines (SVMs)[32, 34, 35]. Through this link, theoretical guarantees and efficient implementations of SVMs permeate to the proposed methods. Second, the theory of kernel-based learning for vector-valued functions is extended to accommodate semiparametric estimation. The resulting methods are of independent interest since they can be applied beyond the present spectrum cartography context and subsume, as a special case, thin-plate splines regression [40, 8].
The rest of the paper is organized as follows. Sec. II presents the system model and formulates the problem. Sec. III proposes nonparametric and semiparametric PSD map estimation algorithms operating in batch mode, whereas Sec. IV develops an online solver. Finally, Sec. V presents some numerical tests and Sec. VI concludes the paper with closing remarks and research directions.
Notation. The cardinality of set is denoted by . Scalars are denoted by lowercase letters, column vectors by boldface lowercase letters, and matrices by boldface uppercase letters. Superscript stands for transposition, and for conjugate transposition. The th entry (th column) of matrix is denoted by (). The Kronecker product is represented by the symbol . The Khatri-Rao product is defined for and as . The entrywise or Hadamard product is defined as . Vector is the -th column of the identity matrix , whereas and are the vectors of dimension with all zeros and ones, respectively. Symbol denotes expectation, probability, trace,
largest eigenvalue, andconvolution. Notation (alternatively ) represents the smallest (largest) integer satisfying ().
Ii System Model and Problem Statement
Consider transmitters located in a geographical region , where is typically111One may set for maps along roads or railways, or even for applications involving aerial vehicles or urban environments. . Let denote the transmit-PSD of the -th transmitter and let represent the gain of the channel between the -th transmitter and location , which is assumed frequency flat to simplify the exposition; see Remark 1. If the transmitted signals are uncorrelated, the PSD at location is given by
where is the noise power and is the noise PSD, normalized to . In view of (1), one can also normalize , , without any loss of generality to satisfy by absorbing any scaling factor into . Since such a scaling factor equals the transmit power, the coefficient actually represents the power of the -th signal at location .
Often in practice, the normalized PSDs are approximately known since transmitters typically adhere to publicly available standards and regulations, which prescribe spectral masks, bandwidths, carrier frequencies, roll-off factors, number of pilots, and so on [38, 31]. If unknown, the methods here carry over after setting to be a frequency-domain basis expansion model [6, 8]. For this reason, the rest of the paper assumes that are known.
The goal is to estimate the PSD map from the measurements gathered by sensors with locations . In view of (1), this task is tantamount to estimating at every spatial coordinate .
To minimize hardware costs and power consumption, this paper adopts the sensor architecture in Fig. 1. The ensemble power at the output of the filter of the -th sensor is , where denotes the frequency response of the receive filter. From (1), it follows that
where can be thought of as the contribution of the -th transmitter per unit of to the power at the output of the receive filter of the -th sensor; whereas and are introduced to simplify notation. The -th sensor obtains an estimate of by measuring the signal power at the output of the filter over a certain time window. In general, due to the finite length of this observation window.
Different from the measurement model in [6, 13, 8], where sensors obtain and communicate periodograms, the proposed scheme solely involves power measurements, thereby reducing sensor costs and bandwidth requirements. To further reduce bandwidth, can be quantized as illustrated in the bottom part of Fig. 1. When uniform quantization is used, the sensors obtain
where is the quantization step; see also Remark 5. If denotes the number of quantization levels, which depends on and the range of , the number of bits needed is now just . Depending on how accurate is, either or . The latter event is termed measurement error and is due to the finite length of the aforementioned time window.
Finally, the sensors communicate the measurements or to the fusion center. Given these measurements, together with and , the problem is to estimate at every . The latter can be viewed individually as functions of the spatial coordinate , or, altogether as a vector field , where . Thus, estimating the PSD map in (1) is in fact a problem of estimating a vector-valued function.
Remark 1 (Frequency-selective channels).
If the channels are not frequency-flat, then each term in the sum of (1) can be decomposed into multiple components of smaller bandwidth in such a way that each one sees an approximately frequency-flat channel. To ensure that these components are uncorrelated, one can choose their frequency supports to be disjoint.
Sensors can also operate digitally. In this case, one could implement the receive filters to have pseudo-random impulse responses . Selecting distinct seeds for the random number generators of different sensors yields linearly independent with a high probability, which ensures identifiability of (cf. (2)).
If a wideband map is to be constructed, then Nyquist-rate sampling may be too demanding for low-cost sensors. In this scenario, one can replace the filter in Fig. 1 with an analog-to-information-converter (A2IC) [37, 31]. To see that (2) still holds and therefore the proposed schemes still apply, let represent the compression matrix of the th sensor, which multiplies raw measurement blocks to yield compressed data blocks . The ensemble power of the latter is proportional to , where denotes the covariance matrix of the uncompressed data blocks, and the covariance matrix of the blocks transmitted from the -th transmitter. Combining both equalities yields (2) upon defining .
Iii Learning PSD Maps
This section develops various PSD map estimators offering different bandwidth-performance trade-offs. First, Sec. III-A puts forth a nonparametric estimator to recover from un-quantized power measurements. To reduce bandwidth requirements, this approach is extended in Sec. III-B to accommodate quantized data. The detrimental impact of strong quantization on the quality of map estimates is counteracted in Sec. III-C by leveraging propagation prior information. For simplicity, these methods are presented for the scenario where each sensor obtains a single measurement, whereas general versions accommodating multiple measurements per sensor are outlined in Sec. III-E.
Iii-a Estimation via nonparametric regression
This section reviews the background in kernel-based learning necessary to develop the cartography tools in the rest of the paper and presents an estimator to obtain PSD maps from un-quantized measurements.
Kernel-based regression seeks estimates among a wide class of functions termed reproducing kernel Hilbert space (RKHS). In the present setting of vector-valued functions, such an RKHS is given by [26, 11], where are expansion coefficient vectors and is termed reproducing kernel map. The latter is any matrix-valued function that is (i) symmetric, meaning that for any and ; and (ii) positive (semi)definite, meaning that the square matrix having as its block is positive semi-definite for any . Remark 4 guides the selection of functions qualifying as reproducing kernels.
As any Hilbert space, an RKHS has an associated inner product, not necessarily equal to the classical . Specifically, the inner product between two RKHS functions and can be obtained through the reproducing kernel as
This expression is referred to as the reproducing property and is of paramount importance since it allows the computation of function inner products without integration. From (4), the induced RKHS norm of can be written as and is widely used as a proxy for smoothness of .
Kernel-based methods confine their search of estimates to functions in , which is not a limitation since an extensive class of functions, including any continuous function vanishing at infinity, can be approximated with arbitrary accuracy by a function in for a properly selected kernel . However, function estimation is challenging since (i) any finite set of samples generally admits infinitely many interpolating functions in and (ii) the estimate does not generally approach the estimated function even for an arbitrary large number of noisy samples if the latter are overfitted. To mitigate both issues, kernel regression seeks estimates minimizing the sum of two terms, where the first penalizes estimates deviating from the observations and the second promotes smoothness. Specifically, if
denotes a loss function of the measurement error, the nonparametric kernel-based regression estimate of is 
where the user-selected scalar controls the tradeoff between fitting the data and smoothness of the estimate, captured by its RKHS norm.
Since is infinite-dimensional, solving (5) directly is generally not possible with a finite number of operations. Fortunately, the so-called representer theorem (see e.g., [32, 2]) asserts that in (5) is of the form
for some , where is of size , and is . In words, the solution to (5) admits a kernel expansion around the sensor locations . From (6), it follows that finding amounts to finding , but the latter can easily be obtained by solving the problem that results from substituting (6) into (5):
The loss function is typically chosen to be convex. The
simplest example is the squared loss , with
the resulting referred to as the kernel ridge
kernel ridge regressionestimate. Defining , , and , expression (7) becomes
Besides its simplicity of implementation, the estimate with as in (8) offers a twofold advantage over existing cartography schemes. First, existing estimators relying on power measurements can only construct power maps [1, 19, 18, 23, 22], whereas the proposed method is capable of obtaining PSD maps from the same measurements. On the other hand, existing methods for estimating PSD maps require PSD measurements, that is, every sensor must obtain and transmit periodograms to the fusion center. This necessitates a higher communication bandwidth, longer sensing time, and more costly sensors than required here [6, 13, 8].
The choice of the kernel considerably affects the estimation performance when the number of observations is small. Thus, it is important to choose a kernel that is well-suited to the spatial variability of the true . To do so, one may rely on cross validation, historical data [32, Sec. 2.3], or multi-kernel approaches . Although specifying matrix-valued kernels is more challenging than specifying their scalar counterparts () , a simple but effective possibility is to construct a diagonal kernel as where are valid scalar kernels. For example, can be the popular Gaussian kernel , where is user selected.
Iii-B Nonparametric regression from quantized data
The scheme in Sec. III-A offers a drastic bandwidth reduction relative to competing PSD map estimators since only scalar-valued measurements need to be communicated to the fusion center. The methods in this section accomplish a further reduction by accommodating quantized measurements.
Recall from Sec. II, that denote the result of uniformly quantizing the power measurements ; see also Remark 5. The former essentially convey interval information about the latter, since (3) implies that is contained in the interval , where . Note that is in fact the centroid of the -th quantization interval.
To account for the uncertainty within such an interval, one can replace in (5) with as
and set to assign no cost across all candidate functions that lead to values of falling around . In other words, such an only penalizes functions for which falls outside of . Examples of these -insensitive loss functions include [32, 34], and the less known
. Incidentally, these functions endow the proposed estimators with robustness to outliers and promote sparsity in, which is a particularly well-motivated property when the number of measurement errors is small relative to , that is, when for most values of .
The rest of this section develops solvers for (9) and establishes a link between (9) and SVMs. To this end, note that application of the representer theorem to (9) yields, as in Sec. III-A, an estimate with
Now focus on and note that , where and respectively quantify positive deviations of with respect to the right and left endpoints of . This implies that satisfies and , whereas satisfies and , thus establishing the following result.
The problem in (10) with the -insensitive loss function can be expressed as
where and .
Problem (11) is a convex quadratic program with slack variables . Although one can obtain using, for example, an off-the-shelf interior-point solver, it will be shown that a more efficient approach is to solve the dual-domain version of (11).
To the best of our knowledge, (11) constitutes the first application of an -insensitive loss to estimating functions from quantized data. As expected from the choice of loss function and regularizer, (11) is an SVM-type problem. However, different from existing SVMs, for which data comprises vector-valued noisy versions of [26, Examples 1 and 2], the estimate in (11) relies on noisy versions of the scalars . Therefore, (11) constitutes a new class of SVM for vector-valued function estimation. As a desirable consequence of this connection, the proposed estimate inherits the well-documented generalization performance of existing SVMs [26, 35, 11]. However, it is prudent to highlight one additional notable difference between (11) and conventional SVMs that pertains to the present context of function estimation from quantized data: whereas in the present setting is determined by the quantization interval length, this parameter must be delicately tuned in conventional SVMs to control generalization performance.
The proposed estimator is nonparametric since the number of unknowns in (11) depends on the number of observations . Although this number of unknowns also grows with , it is shown next that this is not the case in the dual formulation. To see this, let as well as . With and representing the Langrange multipliers associated with the and the constraints, the dual of (11) can be easily shown to be
From the Karush-Kuhn-Tucker (KKT) conditions, the primal variables can be recovered from the dual ones using
Algorithm 1 lists the steps involved in the proposed nonparametric estimator. Note that the primal (11) entails variables whereas the dual (12) has just . The latter can be solved using sequential minimal optimization algorithms , which here can afford simplified implementation along the lines of e.g.,  because there is no bias term. However, for moderate problem sizes (), interior point solvers are more reliable [32, Ch. 10] while having worst-case complexity . As a desirable byproduct, interior point methods directly provide the Lagrange multipliers, which are useful for recovering the primal variables (cf. (23)).
Iii-C Semiparametric regression using quantized data
The nonparametric estimators in Secs. III-A and III-B are universal in the sense that they can approximate wide classes of functions, including all continuous functions vanishing at infinity, with arbitrary accuracy provided that the number of measurements is large enough . However, since measurements are limited in number and can furthermore be quantized, incorporating available prior knowledge is crucial to improve the accuracy of the estimates. One could therefore consider applying parametric approaches since they can readily incorporate various forms of prior information. However, these approaches lack the flexibility of nonparametric techniques since they can only estimate functions in very limited classes. Semiparametric alternatives offer a “sweet spot” by combining the merits of both approaches .
This section presents semiparametric estimators capable of capturing prior information about the propagation environment yet preserving the flexibility of the nonparametric estimators in Secs. III-A and III-B. To this end, an estimate of the form is pursued, where (cf. Secs. III-A and III-B) the nonparametric component belongs to an RKHS with kernel matrix ; whereas the parametric component is given by
with collecting user-selected basis matrix functions , , and .
If the transmitter locations are approximately known, the free space propagation loss can be described by matrix basis functions of the form , where is the attenuation between a transmitter located at and a receiver located at an arbitrary point ; see Sec. V for an example. Note that if this basis accurately captures the propagation effects in , then the -th entry of the estimated is approximately proportional to the transmit power of the -th transmitter.
An immediate two-step approach to estimating is to first fit the data with in (14), and then fit the residuals with as detailed in Sec. III-B. Since this so-termed back-fitting approach is known to yield sub-optimal estimates , this paper pursues a joint fit, which constitutes a novel approach in kernel-based learning for vector-valued functions. To this end, define as the space of functions (not necessarily an RKHS) that can be written as , with and as in (14). One can thereby seek semiparametric estimates of the form
where the regularizer involves only the nonparametric component through the norm in . Using [2, Th. 3.1], one can readily generalize the representer theorem in [32, Th. 4.3] to the present semiparametric case. This yields , where
The problem in (III-C) with the -insensitive loss function can be expressed as
The primal problem in (17) entails vectors of size , which motivates solving its dual version. Upon defining as an matrix whose -th block is , and representing the Langrange multipliers associated with the and constraints by and , the dual of (17) is
where . Except for the last constraint and the usage of , (18) is identical to (12). Similar to (13a), the primal vector of the nonparametric component can be readily obtained from the KKT conditions as
Although can also be obtained from the KKT conditions, this approach is numerically unstable. It is preferable to obtain from the Lagrange multipliers of (18), which are known e.g. if an interior-point solver is employed. Specifically, noting that (17) is the dual of (18), it can be seen that equals the multipliers of the last constraint in (18). Algorithm 2 summarizes the proposed semiparametric estimator.
Iii-D Regression with conditionally positive definite kernels
So far, the kernels were required to be positive definite. This section extends the semiparametric estimator in Sec. III-C to accommodate the wider class of conditionally positive definite (CPD) kernels. CPD kernels are natural for estimation problems that are invariant to translations in the data [32, p. 52], as occurs in spectrum cartography. Accommodating CPD kernels also offers a generalization of thin-plate splines (TPS), which have well-documented merits in capturing shadowing of propagation channels , to operate on quantized data.
Consider the following definition, which generalizes that of scalar CPD kernels [32, Sec. 2.4]. Recall that, given , is a matrix whose -th block is .
A kernel is CPD with respect to if it satisfies for every finite set and all such that .
Observe that any positive definite kernel is also CPD since it satisfies for all .