## 1 Introduction

Recent research in robot exploration and mapping has focused on developing adaptive sampling and active sensing algorithms [Cao, Low, and Dolan2013, Chen, Low, and Tan2013, Chen et al.2012, Hoang et al.2014, Low, Dolan, and Khosla2008, Low, Dolan, and Khosla2009, Low, Dolan, and Khosla2011, Low et al.2007, Low et al.2012, Ouyang et al.2014] to gather the most informative data/observations for modeling and predicting spatially varying environmental fields that are characterized by *continuous-valued*, *spatially correlated* measurements.
Application domains (e.g., environmental sensing and monitoring) requiring such algorithms
often contain multiple fields of interest: (a) Autonomous underwater and surface vehicles are tasked to sample ocean and freshwater phenomena including temperature, salinity, and oxygen concentration fields
[Dolan et al.2009, Podnar et al.2010],
(b) indoor environments are spanned by temperature, light, and carbon dioxide concentration fields that affect the occupants’ comfort and satisfaction towards the environmental quality across different areas,
and (c) WiFi access points/hotspots situated at neighboring locations produce different but overlapping wireless signal strength fields over the same environment.
These algorithms operate with an assumption that the locations of every robot and its gathered observations are known and provided by its onboard sensors such as the widely-used GPS device.
However, GPS signals may be noisy (e.g., due to urban canyon effect between tall buildings) or unavailable (e.g., in underwater or indoor environments).
So, it is desirable to alternatively consider exploiting the spatially correlated measurements taken by each robot for localizing itself within the environmental fields during its exploration; this will significantly extend the range of environments and application domains in which a robot can localize itself.

To achieve this, our robotics community will usually make use of a probabilistic state estimation framework known as the

*Bayes filter*: It repeatedly updates the belief of a robot’s location/state by assimilating the field measurements taken during the robot’s exploration through its

*observation model*. To preserve time efficiency, the Bayes filter imposes a Markov property on the observation model: Given the robot’s current location, its current measurement is conditionally independent of the past measurements. Such a Markov property is severely violated by the spatial correlation structure of the environmental fields, thus strongly degrading the robot’s localization performance. To resolve this issue, the works of Ko2008 Ko2008,Ko2009 have integrated a rich class of Bayesian nonparametric models called the

*Gaussian process*(GP) into the Bayes filter, which allows the spatial correlation structure between measurements to be formally characterized (i.e., by modeling each field as a GP) and the observation model to be represented by fully probabilistic predictive distributions (i.e., one per field/GP) with formal measures of the uncertainty of the predictions.

Unfortunately, such expressive power of a GP comes at a high computational cost, which hinders its practical use in the Bayes filter for persistent robot localization: It incurs cubic time and quadratic memory in the size of the data/observations.
Existing works [Brooks, Makarenko, and
Upcroft2008, Ferris, Hähnel, and
Fox2006, Ferris, Fox, and Lawrence2007, Ko and Fox2009a, Ko and Fox2009b]
have sidestepped this computational difficulty by assuming the availability of data/observations *prior* to exploration and localization for training the GP observation model offline; some [Brooks, Makarenko, and
Upcroft2008, Ferris, Hähnel, and
Fox2006, Ko and Fox2009a]
have assumed these given prior measurements to be labeled with known locations while others [Ferris, Fox, and Lawrence2007, Ko and Fox2009b] have inferred their location labels.
The Markov assumption on the observation model can then be “relaxed” to
conditional independence between the robot’s current measurement and past measurements (i.e., taken during its exploration) given its current location and the trained GPs using prior data/observations, thus improving the efficiency at each filtering step during its exploration to quadratic time in the size of the prior training data.
Any measurement taken during the robot’s actual exploration and localization is thus not used to train the GP observation model.
Such a “relaxed” Markov assumption may hold in certain static environments. However, it becomes highly restrictive and is easily violated in general, practical environmental settings
where, for example,
(a) limited sampling budget (i.e., in terms of energy consumption, mission time, etc.)
forbids the collection of prior training data or only permits extremely sparse prior data to be gathered relative to a large environment, thus resulting in an inaccurately trained GP observation model,
(b) environmental changes invalidate the prior training data, and (c) the robot’s actual exploration path is spatially distant from the prior observations, hence making the trained GP observation model uninformative to its localization.
All these practical considerations motivate us to tackle a fundamental research question:
Without prior training data, how can GPs be restructured to be used by a Bayes filter for *persistent* robot localization in environmental fields characterized by spatially correlated measurements?

This paper presents a *Gaussian process localization* (GP-Localize) algorithm that, in contrast to existing works mentioned above, can exploit the spatially correlated field measurements taken during a robot’s exploration (instead of relying on prior training data) for efficiently and scalably learning the GP observation model online through our proposed novel online sparse GP (Section 3).
As a result, GP-Localize is capable of achieving *constant* time and memory (i.e., independent of the size of the data/observations) per filtering step, which we believe is an important first step towards demonstrating the practical feasibility of employing GPs for persistent robot localization and autonomy.
We empirically demonstrate through simulated experiments with three real-world datasets as well as a real robot experiment that GP-Localize outperforms existing GP localization algorithms (Section 4).

## 2 Background

### 2.1 Modeling Environmental Field with GP

The Gaussian process (GP) can be used to model an environmental field as follows^{1}^{1}1To simplify exposition, we only describe the GP for a single field; for multiple fields, we assume independence between them to ease computations.: The environmental field is defined to vary as a realization of a GP. Let be a set of locations representing the domain of the environmental field such that each location is associated with a realized (random) field measurement if is observed (unobserved). Let denote a GP, that is, every finite subset of

has a multivariate Gaussian distribution

[Rasmussen and Williams2006]. The GP is fully specified by its*prior*mean and covariance cov[] for all , the latter of which characterizes the spatial correlation structure of the field and can be defined using a covariance function. A common choice is the squared exponential covariance function where and

are, respectively, the signal and noise variance controlling the intensity and the noise of the measurements,

is a diagonal matrix with length-scale components and controlling, respectively, the degree of spatial correlation or “similarity” between measurements in the horizontal and vertical directions of the field, and is a Kronecker delta of value if , and 0 otherwise.A chief advantage of using the full GP to model the environmental field is its capability of performing probabilistic regression: Supposing a robot has visited and observed a set

of locations and taken a column vector

of corresponding realized measurements, the full GP can exploit these observations to predict the measurement at any unobserved location as well as provide its corresponding predictive uncertainty using a Gaussian predictive distribution with the following*posterior*mean and variance, respectively:

(1) |

(2) |

where is a column vector with mean components for all , is a row vector with covariance components for all , is the transpose of , and is a matrix with components for all .

### 2.2 Sparse Gaussian Process Approximation

The key limitation hindering the practical use of the full GP in the Bayes filter for persistent robot localization is its poor scalability in the size of the data/observations: Computing the Gaussian predictive distribution (i.e., (1) and (2)) requires inverting the covariance matrix , which incurs time and memory. To improve its scalability, GP approximation methods [Chen et al.2012, Chen et al.2013, Quiñonero-Candela and Rasmussen2005] have been proposed, two of which will be described below.

The simple
sparse *subset of data* (SoD) approximation method uses only a subset of the
set of locations (i.e., ) observed and the realized measurements taken by the robot
to produce a Gaussian predictive distribution of the measurement at any unobserved location with the following posterior mean and variance, which are similar to that of full GP
(i.e., by replacing in (1) and (2) with ):

(3) |

(4) |

The covariance matrix is inverted using time and memory, which are independent of . The main criticism of SoD is that it does not exploit all the data for computing the Gaussian predictive distribution, thus yielding an unrealistic overestimate (4) of the predictive uncertainty (even with fairly redundant data and informative subset ) [Quiñonero-Candela and Rasmussen2005] and in turn an inaccurately trained observation model.

The sparse *partially independent training conditional* (PITC) approximation method is the most general form of a class of reduced-rank covariance matrix approximation methods reported in [Quiñonero-Candela and
Rasmussen2005] exploiting the notion of a support set .
Unlike SoD, PITC can utilize all data (i.e., and ) to derive a Gaussian predictive distribution of the measurement at any with the following posterior mean and variance:

(5) |

(6) |

where for all and is a block-diagonal matrix constructed from the diagonal blocks of , each of which is a matrix for where .
Also, unlike SoD, the support set does not have to be observed.
The covariance matrix in (1) and (2) is approximated by a reduced-rank matrix summed with the resulting sparsified residual matrix in (5) and (6).
So, computing either (5) or (6), which requires inverting
the approximated covariance matrix , incurs time and memory.
The sparse *fully independent training conditional* (FITC) approximation method is a special case of PITC where is a diagonal matrix constructed from for all (i.e., ). FITC is previously employed by Ko2008 Ko2008 to speed up the learning of observation model with prior training data.
But, the time incurred by PITC or FITC grows with increasing size of data. So, it is computationally impractical to use them directly to repeatedly train the observation model at each filtering step for persistent localization.

### 2.3 Bayes Filters

A Bayes filter is a probabilistic state estimation framework that repeatedly updates the belief of a robot’s location/state by conditioning on its control actions performed and field measurements taken so far.
Formally,
let the robot’s control action performed, its location visited and observed, and the corresponding realized field measurement taken at time/filtering step be denoted by , , and ^{2}^{2}2The field measurement is indexed by time step instead of the corresponding location since is not known to the robot., respectively.
To estimate the robot’s location, a belief is maintained over all its possible locations where
and
denote, respectively, column vectors of past control actions performed
and realized field measurements taken by the robot
up until time step .
To track such a belief, after the robot has performed an action and taken a realized measurement at each time step , the Bayes filter updates the prior belief of the robot’s location to the posterior belief
where is a normalizing constant,
is a *motion model*

representing the probability of the robot moving from locations

to after performing action , and is an*observation model*describing the likelihood of taking realized measurement at location .

To preserve efficiency, the Bayes filter imposes a Markov property on the observation model: Given the robot’s current location , its current measurement is conditionally independent of past actions and measurements :

(7) |

In other words, the robot’s past actions performed and measurements taken during its exploration and localization are not exploited for learning the observation model. This is conventionally assumed by existing works either representing the observation model using a parametric model with known parameters

[Thrun, Burgard, and Fox2005] or training it offline using prior training data. The disadvantages of the former are extensively discussed by Ko2008 Ko2008 while that of the latter are already detailed in Section 1.In the case of multiple fields (say, of them), let denote the realized measurement taken from field at location for . Then, the observation model becomes such that the equality follows from an assumption of independence of measurements between fields to ease computations.

## 3 Online Sparse GP Observation Model

In contrast to existing works discussed in Section 1, our GP-Localize algorithm does not need to impose the restrictive Markov property (7) on the observation model, which can then be derived by marginalizing out the random locations visited and observed by the robot up until time step :

(8) |

where is a normalizing constant, is the belief of the robot’s initial location at time step , denotes a set of locations visited and observed by the robot up until time step , and is a Gaussian predictive distribution provided by the GP (Section 2.1). The derivation of (8) is in Appendix A.

To make computations tractable but not constrain the type of motion model that can be specified, the observation model (8) is approximated using Monte Carlo integration:

(9) |

where denotes a -th sample path simulated by first drawing the robot’s initial location from and then sampling from motion model for given its past actions while ensuring , as observed in (8). For a practical implementation, instead of re-simulating the entire sample paths (hence, incurring linear time in ) at each time step, each -th sample path is incrementally updated from (i.e., obtained in previous time step) to (i.e., needed in current time step) by including sampled from motion model without accounting for motion constraint . As a result, the time spent in incrementally updating the sample paths at each time step is independent of . To mitigate the effect of ignoring the constraint, we introduce a strategy in Remark after Theorem 1 that exploits a structural property of our proposed online sparse GP. In practice, such an implementation yields considerable time savings (i.e., time independent of ) and does not result in poor localization performance empirically (Section 4).

The scalability of our GP-Localize algorithm therefore depends on whether the Gaussian predictive probability in (9) can be derived efficiently. Computing it with full GP, PITC, or FITC (Section 2) directly incurs, respectively, , , and time. Since is expected to be large for persistent localization, it is computationally impractical to use these offline full GP and sparse GP approximation methods to repeatedly train the observation model at each filtering step. Even when the online GP proposed by Csato02 Csato02 is used, it still incurs quadratic time in per filtering step. In the following subsection, we will propose an online sparse GP that can achieve constant time (i.e., independent of ) at each filtering step .

### 3.1 Online Sparse GP Approximation

The key idea underlying our proposed online sparse GP is to summarize the newly gathered data/observations at regular time intervals/slices, assimilate the summary information of the new data with that of all the previously gathered data/observations, and then exploit the resulting assimilated summary information to compute the Gaussian predictive probability in (9). The details of our proposed online sparse GP will be described next.

Let each time slice span time/filtering steps to for some user-defined slice size and the number of time slices available thus far up until time step be denoted by (i.e., ).

###### Definition 1 (Slice Summary)

Given a support set common to all sample paths, the subset of the -th sample path simulated during the time slice , and the column vector of corresponding realized measurements taken by the robot, the slice summary of time slice is defined as a tuple for where

such that is defined in a similar manner as in (1) and is a posterior covariance matrix with components for all , each of which is defined in a similar way as (4).

*Remark*. The support set of locations does not have to be observed because the slice summary is independent of
.
So, the support set can be selected *prior* to exploration and localization from

using an offline greedy active learning algorithm such as

[Krause, Singh, and Guestrin2008].###### Definition 2 (Assimilated Summary)

Given , the assimilated summary of time slices to is updated from the assimilated summary of time slices to using and for where and .

*Remark* . After constructing and assimilating
with