## I Introduction

Power spectral density (PSD) cartography relies on sensor measurements to map the radiofrequency (RF) signal power distribution over a geographical region. The resulting maps are instrumental for various wireless network management tasks, such as power control, interference mitigation, and planning [21, 12]. For instance, PSD maps benefit wireless network planning by revealing the location of crowded regions and areas of weak coverage, thereby suggesting where new base stations should be deployed. Because they characterize how RF power distributes per channel across space, PSD maps are also useful to increase frequency reuse and mitigate interference in cellular systems. In addition, PSD maps enable opportunistic transmission in cognitive radio systems by unveiling underutilized “white spaces” in time, space, and frequency [5, 42]. Different from conventional spectrum sensing techniques, which assume a common spectrum occupancy over the entire sensed region [28, 3, 25], PSD cartography accounts for variability across space and therefore enables a more aggressive spatial reuse.

A number of interpolation and regression techniques have been applied to construct RF power maps from power measurements. Examples include kriging

[1], compressive sampling [19], dictionary learning [23, 22], sparse Bayesian learning [18], and matrix completion [15]. These maps describe how power distributes over space but not over frequency. To accommodate variability along the frequency dimension as well, a basis expansion model was introduced in [6, 13, 8] to create PSD maps from periodograms. To alleviate the high power consumption and bandwidth needs that stem from obtaining and communicating these periodograms, [25] proposed a low-overhead sensing scheme based on single-bit data along the lines of [29]. However, this scheme assumes that the PSD is constant across space.To summarize, existing spectrum cartography approaches either
construct *power* maps from *power* measurements, or,
*PSD* maps from *PSD* measurements. In contrast, the main
contribution of this paper is to present algorithms capable of
estimating *PSD* maps from *power* measurements, thus
attaining a more efficient extraction of the information contained in
the observations than existing methods. Therefore, the proposed
approach enables the estimation of the RF power distribution over
*frequency and space* using low-cost low-power sensors since only
power measurements are required.

To facilitate practical implementations with sensor networks, where the communication bandwidth is limited, overhead is reduced by adopting two measures. First, sensor measurements are quantized to a small number of bits. Second, the available prior information is efficiently captured in both frequency and spatial domains, thus affording strong quantization while minimally sacrificing the quality of map estimates.

Specifically, a great deal of frequency-domain prior information about impinging communication waveforms can be collected from spectrum regulations and standards, which specify bandwidth, carrier frequencies, transmission masks, roll-off factors, number of sub-carriers, and so forth [38, 31]. To exploit this information, a basis expansion model is adopted, which allows the estimation of the power of each sub-channel and background noise as a byproduct. The resulting estimates can be employed to construct signal-to-noise ratio (SNR) maps, which reveal weak coverage areas, and alleviate the well-known noise uncertainty problem in cognitive radio [36].

To incorporate varying degrees of prior information in the spatial domain, nonparametric and semiparametric estimators are developed within the framework of kernel-based learning for vector-valued functions [26]. Nonparametric estimates are well suited for scenarios where no prior knowledge about the propagation environment is available due to their ability to approximate any spatial field with arbitrarily high accuracy [11]. In many cases, however, one may approximately know the transmitter locations, the path loss exponent, or even the locations of obstacles or reflectors. The proposed semiparametric estimators capture these forms of prior information through a basis expansion in the spatial domain.

Although the proposed estimators can be efficiently implemented in batch mode, limited computational resources may hinder real-time operation if an extensive set of measurements is to be processed. This issue is mitigated here through an online nonparametric estimation algorithm based on stochastic gradient descent. Remarkably, the proposed algorithm can also be applied to (vector-valued) function estimation setups besides spectrum cartography.

The present paper also contains two theoretical contributions to machine learning. First, a neat connection is established between robust (possibly vector-valued) function estimation from quantized measurements and support vector machines (SVMs)

[32, 34, 35]. Through this link, theoretical guarantees and efficient implementations of SVMs permeate to the proposed methods. Second, the theory of kernel-based learning for vector-valued functions is extended to accommodate semiparametric estimation. The resulting methods are of independent interest since they can be applied beyond the present spectrum cartography context and subsume, as a special case, thin-plate splines regression [40, 8].The rest of the paper is organized as follows. Sec. II presents the system model and formulates the problem. Sec. III proposes nonparametric and semiparametric PSD map estimation algorithms operating in batch mode, whereas Sec. IV develops an online solver. Finally, Sec. V presents some numerical tests and Sec. VI concludes the paper with closing remarks and research directions.

Notation. The cardinality of set is denoted by . Scalars are denoted by lowercase letters, column vectors by boldface lowercase letters, and matrices by boldface uppercase letters. Superscript stands for transposition, and for conjugate transposition. The th entry (th column) of matrix is denoted by (). The Kronecker product is represented by the symbol . The Khatri-Rao product is defined for and as . The entrywise or Hadamard product is defined as . Vector is the -th column of the identity matrix , whereas and are the vectors of dimension with all zeros and ones, respectively. Symbol denotes expectation, probability, trace,

largest eigenvalue, and

convolution. Notation (alternatively ) represents the smallest (largest) integer satisfying ().## Ii System Model and Problem Statement

Consider transmitters located in a geographical region
, where is
typically^{1}^{1}1One may set for maps along roads or
railways, or even for applications involving aerial
vehicles or urban environments. . Let
denote the transmit-PSD of the -th transmitter and let
represent the gain of the channel between
the -th transmitter and location , which is
assumed frequency flat to simplify the exposition; see
Remark 1. If the transmitted
signals are uncorrelated, the PSD at location is given by

(1) |

where is the noise power and is the noise PSD, normalized to . In view of (1), one can also normalize , , without any loss of generality to satisfy by absorbing any scaling factor into . Since such a scaling factor equals the transmit power, the coefficient actually represents the power of the -th signal at location .

Often in practice, the normalized PSDs are approximately known since transmitters typically adhere to publicly available standards and regulations, which prescribe spectral masks, bandwidths, carrier frequencies, roll-off factors, number of pilots, and so on [38, 31]. If unknown, the methods here carry over after setting to be a frequency-domain basis expansion model [6, 8]. For this reason, the rest of the paper assumes that are known.

The goal is to estimate the PSD map from the measurements gathered by sensors with locations . In view of (1), this task is tantamount to estimating at every spatial coordinate .

To minimize hardware costs and power consumption, this paper adopts
the sensor architecture in Fig. 1. The
*ensemble* power at the output of the filter of the -th sensor
is , where
denotes the frequency response of the receive
filter. From (1), it follows that

(2) |

where can be thought of as the contribution of the -th transmitter per unit of to the power at the output of the receive filter of the -th sensor; whereas and are introduced to simplify notation. The -th sensor obtains an estimate of by measuring the signal power at the output of the filter over a certain time window. In general, due to the finite length of this observation window.

Different from the measurement model in [6, 13, 8], where sensors obtain and communicate periodograms, the proposed scheme solely involves power measurements, thereby reducing sensor costs and bandwidth requirements. To further reduce bandwidth, can be quantized as illustrated in the bottom part of Fig. 1. When uniform quantization is used, the sensors obtain

(3) |

where is the quantization step; see also
Remark 5. If denotes the number of
quantization levels, which depends on and the range of
, the number of bits needed is now just . Depending on how accurate
is, either or
. The latter
event is termed *measurement error* and is due to the
finite length of the aforementioned time window.

Finally, the sensors communicate the measurements or to the
fusion center. Given these measurements, together with
and ,
the *problem* is to estimate
at
every . The latter can be viewed individually
as functions of the spatial coordinate , or, altogether as
a vector field , where . Thus, estimating
the PSD map in (1) is in fact a problem of
estimating a vector-valued function.

###### Remark 1 (Frequency-selective channels).

If the channels are not frequency-flat, then each term in the sum of
(1) can be decomposed into multiple components of
smaller bandwidth in such a way that each one *sees* an approximately
frequency-flat channel. To ensure that these components are
uncorrelated, one can choose their frequency supports to be disjoint.

###### Remark 2.

Sensors can also operate digitally. In this case, one could implement the receive filters to have pseudo-random impulse responses [25]. Selecting distinct seeds for the random number generators of different sensors yields linearly independent with a high probability, which ensures identifiability of (cf. (2)).

###### Remark 3.

If a wideband map is to be constructed, then Nyquist-rate sampling may be too demanding for low-cost sensors. In this scenario, one can replace the filter in Fig. 1 with an analog-to-information-converter (A2IC) [37, 31]. To see that (2) still holds and therefore the proposed schemes still apply, let represent the compression matrix of the th sensor, which multiplies raw measurement blocks to yield compressed data blocks [31]. The ensemble power of the latter is proportional to , where denotes the covariance matrix of the uncompressed data blocks, and the covariance matrix of the blocks transmitted from the -th transmitter. Combining both equalities yields (2) upon defining .

## Iii Learning PSD Maps

This section develops various PSD map estimators offering different bandwidth-performance trade-offs. First, Sec. III-A puts forth a nonparametric estimator to recover from un-quantized power measurements. To reduce bandwidth requirements, this approach is extended in Sec. III-B to accommodate quantized data. The detrimental impact of strong quantization on the quality of map estimates is counteracted in Sec. III-C by leveraging propagation prior information. For simplicity, these methods are presented for the scenario where each sensor obtains a single measurement, whereas general versions accommodating multiple measurements per sensor are outlined in Sec. III-E.

### Iii-a Estimation via nonparametric regression

This section reviews the background in kernel-based learning necessary to develop the cartography tools in the rest of the paper and presents an estimator to obtain PSD maps from un-quantized measurements.

Kernel-based regression seeks estimates among a wide class of
functions termed *reproducing kernel Hilbert space* (RKHS). In
the present setting of vector-valued functions, such an RKHS is given
by [26, 11], where
are expansion coefficient
vectors and is termed reproducing kernel
map. The latter is any matrix-valued function that
is (i) symmetric, meaning that for any and ; and (ii) positive (semi)definite,
meaning that the square matrix having
as its block is positive semi-definite for any . Remark 4 guides the
selection of functions qualifying as
reproducing kernels.

As any Hilbert space, an RKHS has an associated inner product, not necessarily equal to the classical . Specifically, the inner product between two RKHS functions and can be obtained through the reproducing kernel as

(4) |

This expression is referred to as the *reproducing property* and
is of paramount importance since it allows the computation of function
inner products without integration. From (4), the
induced RKHS norm of can be written as and is widely used as a
proxy for smoothness of .

Kernel-based methods confine their search of estimates to functions in , which is not a limitation since an extensive class of functions, including any continuous function vanishing at infinity, can be approximated with arbitrary accuracy by a function in for a properly selected kernel [11]. However, function estimation is challenging since (i) any finite set of samples generally admits infinitely many interpolating functions in and (ii) the estimate does not generally approach the estimated function even for an arbitrary large number of noisy samples if the latter are overfitted. To mitigate both issues, kernel regression seeks estimates minimizing the sum of two terms, where the first penalizes estimates deviating from the observations and the second promotes smoothness. Specifically, if

denotes a loss function of the measurement error

, the nonparametric kernel-based regression estimate of is [9](5) |

where the user-selected scalar controls the tradeoff between fitting the data and smoothness of the estimate, captured by its RKHS norm.

Since is infinite-dimensional, solving (5) directly
is generally not possible with a finite number of
operations. Fortunately, the so-called *representer theorem* (see
e.g., [32, 2]) asserts that in (5) is of the form

(6) |

for some , where is of size , and is . In words, the solution to (5) admits a kernel expansion around the sensor locations . From (6), it follows that finding amounts to finding , but the latter can easily be obtained by solving the problem that results from substituting (6) into (5):

(7) |

In (7), the matrix is formed to have as its -th block. Clearly, the minimizer of (5) can then be recovered as .

The loss function is typically chosen to be convex. The simplest example is the squared loss , with the resulting referred to as the

*kernel ridge
regression*

(8) |

Besides its simplicity of implementation, the estimate with as in
(8) offers a twofold advantage over
existing cartography schemes. First, existing estimators relying on
*power measurements* can only construct *power
maps* [1, 19, 18, 23, 22],
whereas the proposed method is capable of obtaining *PSD maps*
from the same measurements. On the other hand, existing methods for
estimating *PSD maps* require *PSD measurements*, that is,
every sensor must obtain and transmit periodograms to the fusion
center. This necessitates a higher communication bandwidth, longer sensing
time, and more costly sensors than required
here [6, 13, 8].

###### Remark 4.

The choice of the kernel considerably affects the estimation performance when the number of observations is small. Thus, it is important to choose a kernel that is well-suited to the spatial variability of the true . To do so, one may rely on cross validation, historical data [32, Sec. 2.3], or multi-kernel approaches [7]. Although specifying matrix-valued kernels is more challenging than specifying their scalar counterparts () [26], a simple but effective possibility is to construct a diagonal kernel as where are valid scalar kernels. For example, can be the popular Gaussian kernel , where is user selected.

### Iii-B Nonparametric regression from quantized data

The scheme in Sec. III-A offers a drastic bandwidth reduction relative to competing PSD map estimators since only scalar-valued measurements need to be communicated to the fusion center. The methods in this section accomplish a further reduction by accommodating quantized measurements.

Recall from Sec. II, that
denote the result of uniformly quantizing the power measurements
; see also
Remark 5. The former essentially convey
*interval information* about the latter, since (3)
implies that is contained in the interval , where
. Note that
is in fact the centroid of the
-th quantization interval.

To account for the uncertainty within such an interval, one can replace in (5) with as

(9) |

and set to assign no cost across all candidate functions
that lead to values of falling around . In other
words, such an only penalizes functions for
which falls
outside of . Examples of these
-*insensitive loss functions* include [32, 34], and the
less known

. Incidentally, these functions endow the proposed estimators with robustness to outliers and promote sparsity in

, which is a particularly well-motivated property when the number of measurement errors is small relative to , that is, when for*most*values of .

The rest of this section develops solvers for (9) and establishes a link between (9) and SVMs. To this end, note that application of the representer theorem to (9) yields, as in Sec. III-A, an estimate with

(10) |

Now focus on and note that , where and respectively quantify positive deviations of with respect to the right and left endpoints of . This implies that satisfies and , whereas satisfies and , thus establishing the following result.

###### Proposition 1.

Problem (11) is a convex quadratic program with slack variables . Although one can obtain using, for example, an off-the-shelf interior-point solver, it will be shown that a more efficient approach is to solve the dual-domain version of (11).

To the best of our knowledge, (11) constitutes the first
application of an -insensitive loss to estimating functions
from quantized data. As expected from the choice of loss function and
regularizer, (11) is an SVM-*type* problem. However,
different from existing SVMs, for which data comprises
*vector-valued* noisy versions of [26, Examples 1 and 2],
the estimate in (11) relies on noisy versions of the
*scalars* . Therefore, (11) constitutes a new class of SVM
for vector-valued function estimation. As a desirable consequence of
this connection, the proposed estimate inherits the well-documented
generalization performance of existing
SVMs [26, 35, 11]. However,
it is prudent to highlight one additional notable difference between
(11) and conventional SVMs that pertains to the present context
of function estimation from quantized data: whereas in the present
setting is determined by the quantization interval length,
this parameter must be delicately tuned in conventional SVMs to
control generalization performance.

The proposed estimator is *nonparametric* since the number of
unknowns in (11) depends on the number of observations
. Although this number of unknowns also grows with , it is shown
next that this is not the case in the dual formulation. To see this,
let as well as . With and
representing the Langrange multipliers associated with the
and the constraints, the dual of (11) can be
easily shown to be

(12) |

From the Karush-Kuhn-Tucker (KKT) conditions, the primal variables can be recovered from the dual ones using

(13a) | ||||

(13b) | ||||

(13c) |

Algorithm 1 lists the steps involved in the
proposed nonparametric estimator. Note that the primal
(11) entails variables whereas the dual
(12) has just . The latter can
be solved using sequential minimal optimization
algorithms [27], which here can afford simplified
implementation along the lines of e.g., [20]
because there is no bias term. However, for moderate problem
sizes (), interior point solvers are more
reliable [32, Ch. 10] while having *worst-case*
complexity . As a desirable byproduct, interior
point methods directly provide the Lagrange multipliers, which are
useful for recovering the primal variables (cf. (23)).

### Iii-C Semiparametric regression using quantized data

The nonparametric estimators in Secs. III-A and
III-B are universal in the sense that they can approximate
wide classes of functions, including all continuous functions
vanishing at infinity, with arbitrary accuracy provided that the
number of measurements is large
enough [11]. However, since measurements are
limited in number and can furthermore be quantized, incorporating
available prior knowledge is crucial to improve the accuracy of the
estimates. One could therefore consider applying *parametric*
approaches since they can readily incorporate various forms of prior
information. However, these approaches lack the flexibility of
nonparametric techniques since they can only estimate functions in
very limited classes. Semiparametric alternatives offer a “sweet
spot” by combining the merits of both
approaches [32].

This section presents semiparametric estimators capable of capturing prior information about the propagation environment yet preserving the flexibility of the nonparametric estimators in Secs. III-A and III-B. To this end, an estimate of the form is pursued, where (cf. Secs. III-A and III-B) the nonparametric component belongs to an RKHS with kernel matrix ; whereas the parametric component is given by

(14) |

with collecting user-selected basis matrix functions , , and .

If the transmitter locations are approximately known, the free space propagation loss can be described by matrix basis functions of the form , where is the attenuation between a transmitter located at and a receiver located at an arbitrary point ; see Sec. V for an example. Note that if this basis accurately captures the propagation effects in , then the -th entry of the estimated is approximately proportional to the transmit power of the -th transmitter.

An immediate two-step approach to estimating is to first fit
the data with in (14), and then fit the
residuals with as detailed in Sec. III-B. Since this
so-termed *back-fitting* approach is known to yield sub-optimal
estimates [32], this paper pursues a joint fit, which
constitutes a novel approach in kernel-based learning for
vector-valued functions. To this end, define as the space of
functions (not necessarily an RKHS) that can be written as , with and
as in (14). One can thereby seek
semiparametric estimates of the form

(15) |

where the regularizer involves only the nonparametric component through the norm in . Using [2, Th. 3.1], one can readily generalize the representer theorem in [32, Th. 4.3] to the present semiparametric case. This yields , where

(16) |

Comparing (10) with (III-C), and replacing with yields the next result (cf. Proposition 1).

###### Proposition 2.

The problem in (III-C) with the -insensitive loss function can be expressed as

(17) |

The primal problem in (17) entails vectors of size , which motivates solving its dual version. Upon defining as an matrix whose -th block is , and representing the Langrange multipliers associated with the and constraints by and , the dual of (17) is

(18) |

where . Except for the last constraint and the usage of , (18) is identical to (12). Similar to (13a), the primal vector of the nonparametric component can be readily obtained from the KKT conditions as

(19) |

Although can also be obtained from the KKT conditions, this approach is numerically unstable. It is preferable to obtain from the Lagrange multipliers of (18), which are known e.g. if an interior-point solver is employed. Specifically, noting that (17) is the dual of (18), it can be seen that equals the multipliers of the last constraint in (18). Algorithm 2 summarizes the proposed semiparametric estimator.

### Iii-D Regression with conditionally positive definite kernels

So far, the kernels were required to be positive definite. This
section extends the semiparametric estimator in Sec. III-C to
accommodate the wider class of *conditionally positive definite*
(CPD) kernels. CPD kernels are natural for estimation problems that
are invariant to translations in the data [32, p. 52],
as occurs in spectrum cartography.
Accommodating CPD kernels also offers a generalization of thin-plate
splines (TPS), which have well-documented merits in capturing
shadowing of propagation channels [8], to
operate on quantized data.

Consider the following definition, which generalizes that of scalar CPD kernels [32, Sec. 2.4]. Recall that, given , is a matrix whose -th block is .

###### Definition 1.

A kernel is CPD with respect to if it satisfies for every finite set and all such that .

Observe that any positive definite kernel is also CPD since it satisfies for all .

Comments

There are no comments yet.