I Introduction
State estimation is an essential component of almost all robotic systems. Having accurate and consistent state estimates not only assists with highlevel decisionmaking, but also directly improves the performance of other modules, such as planning and control. While there are many established techniques of estimation for linearGaussian systems, classical techniques for nonlinear state estimation still pose challenges, requiring complex modelling of the system and/or problemspecific estimation techniques. Modelfree estimation algorithms, in contrast, learn key aspects of the system models from training data, then perform estimation on similarly distributed test data. They can be applied to a wider variety of systems without requiring a priori models. Current methods often involve lifting the system into a higherdimensional space, either with kernel embeddings of probability distributions in a Reproducing Kernel Hilbert Space (RKHS) or by approximating the Koopman operator. However, kernel embedding methods suffer from poor scalability, and Koopmanbased methods usually require problemspecific basis functions. Furthermore, most Koopmanbased methods learn a lifted linear realization of the system, but many systems in robotics are controlaffine, which has a lifted bilinear realization but not necessarily a linear one
[controlaffinetobilin]. As well, to the best of our knowledge, there are no datadriven algorithms that formulate into a clean batch framework of linear state estimation, thus limiting their uses in robotic applications.In this work, we propose the Koopman State Estimator (KoopSE), a state estimation framework that combines aspects from both kernel embedding and Koopmanbased methods in a novel way such that it

is applicable to controlaffine systems with no prior knowledge of the process and measurement models,

formulates the problem as a highdimensional batch linear state estimation framework, which admits solutions for state means and covariances, and

has a training cost that scales linearly with the amount of data and an inference cost that is independent of the amount of training data, in contrast with standard kernel methods.
This paper is structured as follows. After reviewing related work in Section II, we summarize theories for kernel embeddings and Koopman with controlaffine systems in Section III. We derive the exact KoopSE algorithm through Sections IVVI, then apply the RFF approximation in Section VII. We present experimental results in Sections VIII and conclude in Section IX.
Ii Related Work
Methods involving kernels and RKHS are becoming widely used in machine learning and robotics
[Hofmann], allowing for modelfree state estimation. Kernel mean embeddings allow probability distributions to be embedded as elements of a RKHS, allowing for operations on random variables in a higherdimensional space
[song]. The kernelized version of Bayes’ Rule was first used by [kbr] to build the kernelized Bayes filter, which was subsequently extended to a kernelized smoother by [ksmoother]. Although these methods can exactly learn nonlinear systems, they scale cubically with the number of training samples. To reduce the computational cost, [kkr]used regularization assumptions and subspace projections to construct the Kernel Kalman Filter, which instead scales linearly with training data. Kernel embedding methods in general have prediction cost that scales poorly with training data as well as only being applicable to systems without control inputs. Other kernelized methods such as Gaussian Process (GP) regression
[gpbook] have also been used [GPBayes], but GPs assume additive Gaussian noises on the motion and measurement models, performing poorly compared to kernel embedding methods for systems with multimodal noise [GPnoiseMulti].In contrast to kernelembedding methods, Koopmanbased methods work in the feature space directly, lifting the states into higher dimensions using a set of basis functions (i.e., features) [Koopman], [mauroy_2020_koopman]. The system matrices can be computed efficiently using extended Dynamic Mode Decomposition (EDMD) [dmdbook], [dmdbigbook], and cost of inference is constant with respect to training points at test time. However, most techniques use basis functions tailored to their specific problems [koopmanso3], [koopmancontrol], limiting their generality. It was shown by [controlaffinetobilin] that a deterministic controlaffine system can be exactly represented as a lifted bilinear system using a Koopman operator, and the choice of basis functions can be taken as polynomial, Fourier, or other generic sets of features. Although [controlaffinetobilin] did not consider sensor measurements nor state covariances, similar to the majority of Koopmanbased methods, this equivalence will be foundational for our KoopSE framework.
A variant of Fourier features is Random Fourier Features (RFF) [rff], a set of probabilistic features for approximating the associated kernel functions. RFF has been effective for various robotics problems, such as continuous occupancy mapping [hilbertmaps] and robot dynamics learning [rfflearndynamics]. Although [koopmanwithrff] demonstrated that RFF can feasibly approximate the Koopman operator, the features were again used directly in EDMD. We instead leverage the connection between RFF and kernel embeddings in our work, combining the efficiency of Koopmanbased methods with the generality of kernelembedding methods.
Iii Preliminaries
Iiia Reproducing Kernel Hilbert Space (RKHS) Embeddings
Consider a general nonlinear system of the form
(1a)  
(1b) 
where is the state, the control input, the process noise, the measurement output, and the measurement noise, all at timestep . We embed each of the state, input, and measurements in an appropriate RKHS [rkhs], [rkhsbook],
(2) 
where
(3) 
are the embeddings (feature maps) associated with the kernels of , , and , respectively. Note that the embeddings may possibly be infinite dimensional. Unlike Koopmanbased methods, which do not restrict the embedding space type, a RKHS embeds entire distributions of random variables in the higherdimensional space. The distribution is embedded by the mean map [kernelembeddings]
(4) 
with as the expectation operator. So long as the kernel associated with is characteristic then the mean map allows us to represent any distribution over . Given a finite set of samples of random variable, , we can approximate the mean map as
(5) 
where the weights depend on how the samples were drawn. We can also rewrite this as , where
(6) 
We see the samples, , acting as a basis for with weights . If we then want to calculate the expectation of any function of , the mean map allows us to do this as
(7) 
for some nonlinear function . In particular, if is the identity function then we have
(8) 
as expected where . This is the Representer Theorem [representerthm], and we can use this property to recover the mean of our random variable in the original space after doing calculations in the RKHS, eliminating the need to find inverse transformations from the lifted to the original space.
In addition, we can define an centered covariance map [kernelembeddings] similarly to the mean map as
(9) 
which can also represent any joint distribution over
provided the kernel associated with is characteristic. Given a finite set of samples from , we can approximate the covariance map as , where is a weight matrix whose values depend on how the samples were drawn. Unlike Koopman, these maps allow for the recovery of covariances of random variables from the lifted space.The main idea that we will pursue in this paper is to use the embeddings in (2) to embed a controlaffine system in an RKHS where we can write it as a bilinear system. In Sections IV and V, we assume that we can explicitly lift quantities into their respective RKHS embeddings, even though are potentially infinite dimensional. We will later see in Section VII how these derivations still hold under RFF approximations of the feature maps.
IiiB Lifting of ControlAffine Systems
Many robotics systems can be written in controlaffine form affected by process and measurement noise,
(10a)  
(10b) 
where , , , , represents the same quantities as in the general nonlinear system in (1), and are the various components of the dynamic model. We look to lift this system into an RKHS. With a sufficiently rich feature map, , a deterministic controlaffine motion model (i.e., ) can be written exactly as a bilinear model in a lifted space [controlaffinetobilin],
(11) 
where
represents the tensor product, equivalent to the Kronecker product if
is finitedimensional. We assume that this result holds fairly well for a stochastic system, where the lifted noise becomes additive and Gaussian. This is reasonable as the combination of various sources of random and systematic errors in very high dimensions likely approaches a Gaussian under the Central Limit Theorem. The original equivalence by
[controlaffinetobilin] used the unlifted control input, , in the lifted space, but we will use its lifted counterpart, , since the equivalence still holds if is sufficiently rich.For the measurement model, we assume that the lifted model is linear in the deterministic case,
(12) 
and we make a similar additiveGaussian assumption for the measurement noise. The resulting timeinvariant stochastic bilinear system in the lifted space can be written as
(13a)  
(13b) 
where and are the process and measurement noises, respectively. We also have
(14a)  
(14b)  
(14c) 
This lifted system with a bilinear motion model and a linear measurement model is significantly easier to work with than the general controlaffine system in (10). However, since we assumed the system model is not given in either form (original or lifted), we first need a method of learning the lifted model from data.
Iv System Identification
Iva Lifted Matrix Form of Dataset
Our objective is to learn the lifted system matrices in (13) from data. To this end, we assume a dataset of the controlaffine system, including the groundtruth state transitions with their associated control inputs and measurements for states: . Here, transitions to under input , and receives a measurement at . This format allows for data from one or multiple training trajectories to be used at once. If the dataset consists of a single trajectory of states and represents the timestep, then we would set . In any case, we write the data neatly in blockmatrix form:
(15a)  
(15b) 
The data translates to in the lifted space such that
(16a)  
(16b) 
for some unknown noise, , . We rewrite the lifted versions of the data and the noises in blockmatrix form:
(17a)  
(17b)  
(17c) 
The lifted matrix form of the system for this dataset is
(18a)  
(18b) 
where denotes the KhatriRao (columnwise) tensor product.
IvB Loss function
We now design a loss function from which to optimize for the system matrices from the data. Rather than using the EDMD framework of forming the matrices with major Koopman modes, we use Tikhonov regularization for a cleaner formulation. The model learning problem is posed as
(19) 
where the loss function, , is the sum of
(20a)  
(20b) 
Here, the norm is a weighted Frobenius matrix norm: . represents the negative loglikelihood of the Bayesian posterior from fitting the data. are prior terms over the matrices, where the first four terms encourage the description length of , , , and to be minimal while the last two are (isotropic) inverseWishart (IW) priors for the covariances and . IW distributions have been demonstrated to be robust priors for learning covariances [esgviextended]
. The regularizing hyperparameters,
, will be later tuned according to the data gathered.We find the critical points by setting derivatives of with respect to the model parameters , , , , , and to zero. We define
(21) 
This yields the following expressions:
(22a)  
(22b)  
(22c)  
(22d) 
where represents the identity operator for the appropriate domains. We can solve for , , and through solving a system of linear equations, then use these results to find and . This procedure is linear in the amount of training data, , for both computation and storage.
V Batch Linear State Estimation
Having learned the system matrices from training data, we now wish to solve for a sequence of test states, , given a series of inputs, , and measurements, , where denotes quantities at test time. We do this in the lifted space, where the quantities become , , and , respectively. The main insight is that since the inputs are completely determined at test time, we can manipulate the lifted timeinvariant bilinear form of (13) into a lifted timevarying linear form. We observe that
(23) 
where here . The motion model for the test trajectory becomes, for ,
(24) 
As is given at test time, we define a new timevarying system matrix and input as
(25) 
With this, we have converted the bilinear system into a linear
timevarying (LTV) system, governed by
(26a)  
(26b) 
where and . This is the wellestablished batch stateestimation problem on linearGaussian systems [barfoottxtbk]. The solution is in the form of
(27) 
and are, respectively, the mean and covariance estimates for the test trajectory. One popular method for solving (26) is the RauchTungStriebel (RTS) smoother, but there are other efficient methods for obtaining exact solutions [barfoottxtbk].
Vi Recovering Estimates from Lifted Space
Having solved for the state estimates and covariances in the RKHS, we now wish to recover these quantities in the original space. Using the Representer Theorem [representerthm], since the solution of the LTV system in (26) is the result of a linear optimization problem, it must be spanned by the training data used to form the system matrices. We can thus write the state and covariance outputs of each timestep as
(28) 
where , consist of the appropriate weights, which we solve using the left pseudoinverse with a small hyperparameter to regularize the inversion of :
(29a)  
(29b) 
Now, by the properties of the mean map and the covariance map, these same weights can be used to construct the mean states and covariances in the original space through the weighted linear combination of training data,
(30) 
where is the state at timestep . We can then solve for the state means and covariances with
(31a)  
(31b) 
For efficient computation, let be defined by , which can be precomputed during training. Using ShermanMorrisonWoodbury (SMW) identities, we can then rewrite (31) as
(32) 
Note that (32
) assumes that the original states are within a vector space. The states could instead contains angles or other quantities on a circular domain. In this case,
should be a weighted average of circular quantities with weights computed as usual, since the RKHS embeddings are within a vector space, and a similar weighted average is done for . We still use (32) but slightly modify the computation of . See Section VIII for an example.Vii Approximate Embeddings with Random Fourier Features
So far, we have been working in the RKHS space and using the feature maps directly. For many kernels, however, these RKHS feature maps are very high (potentially infinite) dimensional. As it is often unclear how to truncate the feature maps directly to get reasonable approximations, we look for another form of approximate embeddings for , , and such that the solution computed through the algorithm approaches that from the true embeddings.
We notice that all of the RKHS quantities computed in our procedure appear to only be involved in the form of inner products, , or outer products,
. Thus, we turn to Random Fourier Features (RFF) for deriving approximate embeddings. RFF is a randomized feature map sampled from the Fourier transform of a shiftinvariant kernel
[rff]. Suppose we have two quantities, and , in some state space, an operator, , for lifting them into RKHS embeddings, and , and the kernel function, , associated with this RKHS. We can use , the RFF embedding corresponding to , to approximate the kernel evaluation as(33) 
We see that as long as any RKHS quantities within an expression are involved in the form of inner products within the same space (i.e., kernelized), we can swap the exact RKHS embeddings of these quantities with their RFF embeddings to get a valid approximation. We can freely manipulate RKHS quantities within kernelized expressions for efficient computation, and the approximation from swapping the embeddings from RKHS to RFF would still be valid.
Viia Sketch that KoopSE is Kernelized
Since we look to use RFF as approximate embeddings for , , and , our goal is to show that KoopSE uses the RKHS quantities only in their kernelized forms. Additional details are presented in Appendix A.
For training, the lifted training points are used to compute the bilinear system matrices in (13). Using SMW identities, it can be seen that the analytical solution of (22) has the form
(34a)  
(34b)  
(34c) 
where are some kernelized matrices. At test time, we assume that the initial condition, new control inputs, and new measurements can be written as linear combinations of those seen in training:
(35a)  
(35b)  
(35c) 
for weights , , and . Then, it can be seen from (25) that the timevarying quantities have the form
(36) 
for some kernelized matrices, and . Now, the solution to (26) is exactly solved by the RTS smoother [barfoottxtbk]. It is then straightforward to prove through two induction proofs, one for the forward pass and one for the backward pass, that the state means and covariances for both passes have the forms
(37) 
for and some kernelized matrices and scalar . By substituting these expressions into (31), it can be seen that the results for and , the means and covariances in the original space, are indeed kernelized. Therefore, from input to output, the algorithm only uses RKHS quantities in their kernelized forms.
ViiB Substituting with Random Fourier Features
As the algorithm is kernelized, we can approximate the kernel functions associated with embeddings by directly swapping them with the respective finitedimensional RFF embeddings on the original quantities. We outline the procedure below, and we analyze the time and memory complexity of the algorithm.
Let , , and represent the RFF embedding for , , and , with being the respective ranks of the approximation. We replace the RKHS embeddings with their RFF counterparts, denoted by , for the training quantities:
(38) 
where . The embedded training data in blockmatrix form becomes
(39a)  
(39b) 
which are used to compute the now finitedimensional RFFcounterpart of the system matrices,
(40a)  
(40b) 
through solving (22) with the embedded training data. Letting , this procedure has a computational complexity of and a memory complexity of , both scaling only linearly with the number of training points.
For testing, we replace the initial condition, incoming control inputs, and incoming measurements with their RFFs,
(41a)  
(41b)  
(41c) 
and we form , using (25) with the quantities. We then solve the RFFequivalent of the LTV system in (26) using a linear batch state estimator, yielding the RFFequivalent state mean, , and covariance, , for each timestep . With an RTS smoother or a similarly efficient estimator, this procedure takes a computation complexity of and a memory complexity of . Finally, we convert them back into state space:
(42a)  
(42b) 
where . Due to kernelization, although the RFFequivalents of the RKHS variables and model matrices likely look very different, the results for in (42) converge to the true results in (32) as the ranks of the approximations approaches infinity. Note that can be precomputed during training. As such, the overall computation and memory complexity of testing is still and , respectively, regardless of the amount of training data used. As mentioned before, the computation of would need to be slightly modified if there were circular quantities in the original states. See Algorithm 1 for a summary of the full algorithm.
Viii Experiments and Results
Viiia Problem Setup
KoopSE was evaluated on a controlaffine system with a nonlinear measurement model in simulation, then on a experimental dataset with a similar setup. The problem was estimating the positions and headings of a wheeled robot driving in a 2D plane and receiving range measurements from five ultrawideband (UWB) sensors. For timestep , the state, input, and measurement are, respectively,
(43) 
where is the robot’s position, is its orientation, is its linear velocity, is its angular velocity, and is the range measurement from the robot to the th UWB sensor. This uses the common state estimation practice of using interoceptive measurments as inputs in the process model [probabilisticrobotics].
For the state, we used a product of a squaredexponential kernel for the robot’s position and a periodic kernel for its orientation. The respective formulas for generating these RFFs can be found in [rff] and [rffperiodic]. The combined RFF for the state is thus the Cartesian product of the two RFF sets [rffperiodic]. For the input, we kept it as is since there were no improvements from lifting it to higher dimensions, thereby using a linear kernel. We used the squaredexponential RFF for the measurement.
Since orientation is on a circular domain, we take special care in computing the weighted average of states in (42). We convert the orientations to their Cartesian form,
(44) 
then use to compute in (42b) instead of using . When computing and in (42a) at test time, we first compute their Cartesianform equivalents:
, giving us the Gaussian distribution of
, , , and . We then use the method of [wang13] to estimate the distribution of given the Gaussians of and .For evaluating estimation algorithms, we use the rootmeansquarederror (RMSE) and the Mahalanobis distance of the estimated trajectories. An accurate estimator has an RMSE close to , and a consistent estimator has a Mahalanobis distance close to . We tuned the RFF parameters and the regularizing hyperparameters for KoopSE accordingly for these objectives. We compare our results with a modelbased Liegroup extended RTS smoother, whose model covariances are also tuned with the same objectives, including increasing the measurement covariances for the two noisy sensors. KoopSE was first verified in simulation, then validated on an experimental dataset gathered by a robot with the same environment setup. We present our results for the two scenarios below.
ViiiB Simulation Results
For the simulation, we added a cm unmodelled bias to two out of the five sensors, resulting in a UWB error profile similar to that gathered from experiment in Section VIIIC. Training data were generated by the robot roughly following randomly generated trajectories in an enclosed area. One hundred trajectories of 1000 timesteps were used for evaluation, and the combined results are presented in Table I. The error plots for 5 trajectories are shown in Fig. 2. In Fig. 3, we present the results of testing KoopSE on the 5 trajectories using various numbers of RFF and number of training data points, in comparison to the modelbased smoother.
KoopSE  ModelBased  

Translation RMSE (m)  
Orientation RMSE (rad)  
Translation Maha. distance  
Orientation Maha. distance 
ViiiC Experimental Results
Experiments in a lab setting using a Clearpath Husky Unmanned Ground Vehicle (UGV) were used to validate the proposed approach. A 30minute dataset of the UGV driving in an indoor environment was collected. Groundtruth position and orientation data was collected using an OptiTrack motion capture system. The wheel odometry consisting of forward velocity and yaw rate was calculated from wheel encoders. Five UWB anchors transmitting range measurements were used, with two of the anchors obstructed with metal plates representing clutter in the environment, creating additional measurement bias on the order of 30 cm. A picture of the Husky and the anchors is shown in Fig. 1.
To validate the generality of KoopSE, we ran a 6fold crossvalidation, training on 25 minutes of data and testing on a random 100second section of the remainder. The error plots for each fold are shown in Fig. 4. The RMSE and Mahalanobis distances for the folds are shown in Fig. 5.
Ix Discussion and Conclusion
The results highlight the benefits of the datadriven approach. Table I and Fig. 2 show that in simulation, KoopSE has lower errors than the modelbased extended RTS smoother, which is ignorant of the range biases, even though both methods are consistent. This shows that, despite having no knowledge of the robot’s motion model, nor the UWB range measurement model, nor the position of the anchors, KoopSE outperforms the classical method when a small unmodelled measurement bias is introduced. In fact, when the bias is removed, we found that KoopSE performed just as well as the modelbased smoother. In Fig. 3, the RMSE of KoopSE quickly decreases over the first 100 RFF and the first 10000 training points. KoopSE surpassed the classical smoother with only 128 RFF and 20000 training points (about 17 minutes), making it feasible for realworld applications.
The results on the experimental dataset validated our findings. As shown in Fig. 5, compared to the modelbased smoother, KoopSE achieved lower translational RMSE and similar orientation RMSE for all folds. Unlike for the simulation setting, the covariance parameters for the modelbased smoother achieving the lowest RMSE resulted in overconfident position estimates, despite best efforts in tuning. This is likely due to unmodelled effects in its motion or sensor models, which KoopSE overcame with its modelfree approach. Overall, KoopSE is an efficient framework for datadriven state estimation of controlaffine systems. We have demonstrated KoopSE’s feasibility in a nonlinear UWBtracking problem, and have shown that this system and potentially many others are approximately bilinearGaussian in a higherdimensional space, permitting state estimation with familiar linear tools. This method is applicable for cases where the system model is unknown or has complicated noise distributions, such as in the demonstrated scenario of indoor navigation with UWB sensors.
Although KoopSE requires no prior knowledge on system models, training requires groundtruth states as input. Future work could investigate ways of learning system models without exact groundtruth states. As well, since the learned lifted models have proven to be effective for state estimation, a natural extension is to use the same models for datadriven control under a similar framework.
a Detailed Sketch that KoopSE is Kernelized
Our goal is to show that KoopSE uses the RKHS quantities only in their kernelized forms. For training, the lifted training points are used to compute the bilinear system matrices in (13). Using SMW identities, we can write the analytical solution of (22) for , , and , as well as an alternative expression for :
(45a)  
(45b) 
where
Comments
There are no comments yet.