Machine learning is an essential and ubiquitous framework for solving a wide range of tasks in different application areas. One of these tasks is the problem of estimating position (localization) of a mobile robot moving in an uncertain environment, which is necessary for understanding the environment to make navigational decisions. Usually to solve self-localization problem for navigation of autonomous robots a built in visual sensor system (camera) is utilized.
The most common and basic method for performing localization is through dead-reckoning using data received from odometer sensors. This technique integrates a history of sensor readings and executed actions (e.g., velocity history) of the robot over time to determine changes in positions from its starting location [1, 2, 3]. It is common to combine the additional localization technique based on visual data such as images or range profiles (see review ) with dead reckoning applying the extended Kalman filter to probabilistically update the robot position [5, 6]. In passive vision-based localization, position of an autonomous mobile robot equipped by a visual system (e.g., omnidirectional imaging system [7, 8, 9] or camera with steerable orientation ) can be estimated from images captured by the visual system. Continuous set of possible images, which can be captured by the visual system under all possible image formation parameters (relative position and orientation of the robot moving in a certain workspace, as well as camera intrinsic parameters and illumination function) is called Appearance Space (AS).
In this paper, we consider the most popular case when the AS is parameterized by robot localization consisting only of its position and orientation. Assuming that captured images allow distinguishing and recognizing localizations from which they have been taken, the solution to the considered appearance-based (passive vision-based) robot localization problem is a mapping from the AS to the Localization Space (LS), formed by all possible robot localizations.
We consider the appearance-based localization problem in the machine learning framework: the appearance-based model, which describes the relation between observed images and robot locations, is constructed using a finite set of captured images taken in known positions. This model allows estimating an unknown robot localization from a newly acquired image. Such appearance-based learning framework has become very popular in the field of robot learning .
Various appearance-based models (aka maps), describing the underlying low-dimensional structure in the AS, are usually constructed from training positions-images data using supervised learning techniques. Such models provide internal representation of the high-dimensional AS by certain visual features (or landmarks) extracted from images, see a short review in.
In many appearance-based methods input images from the AS are considered holistically, in relation to other images, and natural visual features are computed by projecting images onto low-dimensional subspaces [9, 13, 14] usually via the Principal Component Analysis (PCA). Various types of regression between images or their low-dimensional features and robot coordinates are constructed including Gaussian process regression, random forest, etc.[15, 16, 17, 18, 19, 20, 21, 22].
This paper proposes new geometrically motivated machine learning approach to the appearance-based robot localization problem by combining a few advanced techniques.
At first a deep learning model extracts visual features, representing a learned mapping from the AS to an image Visual Feature Space (VFS) (image of the AS under this mapping). The VFS is a low-dimensional manifold (surface) in the ambient high dimensional space with intrinsic dimensionality equal to the dimensionality of the LS, which in turn is equal to three. Thus, the solution of the Robot localization problem reduces to the regression problem with manifold valued inputs, “living” in the VFS, and outputs, belonging to the LS.
Next the manifold learning procedure from  is used to construct regression with high-dimensional manifold valued inputs.
After that inverse mapping from the LS to the VFS is constructed via special nonlinear dimensionality reduction procedure [24, 25, 26]. This mapping allows predicting visual features of an image that will be captured at the predicted localization of the mobile robot based on chosen navigational decisions. The latter allows solving the robot localization problem as the Kalman filtering problem.
The paper is organized as follows. Section II contains rigorous statement of the appearance-based robot localization problem. Proposed approach is described in Section III. Section IV provides details of this solution. Section V describes results of numerical experiments. Conclusions are given in section VI.
Ii Robot Localization: Rigorous Problem Statement
Let a mobile robot, equipped with a visual system (for example, an omnidirectional imaging system), moves on a 2D-workspace. Its localization
is a three-dimensional vector consisting ofRobot Position (robot coordinates in a -workspace) and Robot Orientation (an angle) relative to the coordinate system in the workspace. Let us denote by a subset consisting of all possible robot localization parameters and called Localization space.
Let an image, captured by the robot imaging system, consists of pixels and, thereby, is represented by an image-vector . We denote by an image, captured by the robot with the localization parameter , which is described by the Image modeling function with the domain of definition . Let us denote by
the Appearance space consisting of all possible images, which can be captured by the mobile robot and parameterized by the robot localization parameter .
Assuming that images, captured by the robot in different localizations, are different, the Image modeling function is a one-to-one mapping from the LS to the AS. Thus, the AS is a three-dimensional manifold without self-intersections (Appearance manifold, AM), parameterized by the chart and embedded in the ambient -dimensional space. Therefore, there exists an inverse mapping
called Localization function from the AM to the LS .
The functions and , as well the AM , are unknown, and the Robot localization problem consists in constructing the robot localization from the image . We consider this problem in the machine learning framework. Let us denote by
a training dataset consisting of images , captured by the robot visual system in known localizations when robot moves in the workspace randomly or on a regular grid. For example, the mobile robot, described in , captures omnidirectional images every centimeters along robot random paths; reference positions are located on a regular grid with cells of size either cm in a mm workspace  or m in a mm workspace , respectively.
We consider the robot localization problem as Localization function estimation problem: to recover an unknown Localization function at an arbitrary out-of-sample point , from its known values at known points .
Iii Robot Localization: Proposed Approach
Proposed approach consists of four successively executed steps:
pre-process captured images to get image visual features;
estimate robot localization from visual features of captured images;
predict visual features of an image that will be captured at a new given robot localization;
solve the robot localization problem via the Kalman filtering technique.
Details are given below.
Iii-a Preprocessing Captured Images
Image data requires subject-matter expertise to extract key features. Deep convolutional neural networks (DCNNs) extract features automatically from domain-specific images, without any feature engineering techniques. This process makes DCNNs suitable for extraction of visual features:
DCNN with many layers is trained using some extensive image database,
Usually initial layers learn low-order features (e.g. color, edges, etc.),
The constructed mapping, realized by final layers of DCNN, determines Visual Feature Space (VFS)
consisting of visual features for all images from the AS.
As well the AS , the VFS is a three-dimensional manifold parameterized by the chart and embedded in an ambient -dimensional space. The mapping , called Feature modeling function, predicts features of an image that is captured at an arbitrary robot localization .
Iii-B Nonlinear Regression on High-dimensional Inputs: Localization Function Estimation
Using the constructed mapping , initial Localization function estimation problem is reduced to the regression problem on the VFS : using dataset
with known robot localizations and computed features from images , captured in these localizations, we estimate an unknown robot position from visual features of an image , captured at some new robot localization . The solution determines the sought-for mapping .
Initial and reduced regression problems deal with high dimensional inputs, and standard regression methods perform poorly due the statistical and computational ‘curse of dimensionality’ phenomenon. However, by construction the input spaces in both problems are three-dimensional manifolds parameterized by the chartsand and embedded in the ambient high-dimensional spaces and , respectively. Taking into account this fact, in order to avoid the curse of dimensionality phenomena most of approaches use various dimensionality reduction techniques (usually, PCA) for constructing low-dimensional visual features by projecting captured images onto low-dimensional linear subspaces. The robot localization function is estimated from such visual features [11, 13, 14]. However, constructed low-dimensional linear subspaces may have a larger dimension than the real intrinsic dimension, which is equal to three.
Our approach uses new solution to nonlinear multi-output regression problem with input belonging to unknown manifold . The core of the solution is based on a special nonlinear dimensionality reduction method called Grassmann & Stiefel Eigenmaps [24, 25]; this method has already been successfully used to solve certain statistical learning problem .
The solution  results in
embedding mapping from the VFS to the three-dimensional Representation Space (RS) consisting of representations for computed features ;
recovered mapping , which maps the low-dimensional representation of visual features to the robot localization .
Therefore, the mapping is estimated by the statistic and sought-for Localization function is estimated by
Iii-C Nonlinear Regression with High-dimensional Inputs: Feature modeling function estimation
Consider the problem of Feature modeling function estimation from the dataset (5). This regression problem can be described as follows: estimate an unknown mapping from the LS to the VFS using known visual features of images captured at known points . Here we have a regression problem with high-dimensional manifold valued outputs belonging to the -dimensional manifold (4).
The solution  also realizes recovery mapping from the low-dimensional representation to its preimage (visual features ). This mapping is ‘approximately inverse’ to the embedding mapping and provides proximity
This quantity allows estimating the Feature modeling function by the formula
Iii-D Kalman Filtering in Robot Localization Problem
Robot navigation consists in choosing the control
at given time moments. Let be a current robot position at time , then, under some chosen control , the robot must move to the expected position
where is a known function defined by a solution of a navigation motion control problem.
In practice, the estimated position of the robot at time is only known, which differs from the exact position ; the exact position at time also differs from the expected position .
Let a robot visual sensing system provides a captured image at the moment . We want to solve a filtering problem to improve the predicted localization from the captured image .
where is a Kalman gain.
Using the estimator (6) for the Localization function (2), we can use a quantity , representing visual features, as an estimator of the robot pose in which the image has been taken, and construct the estimator as
where is another gain function.
As measurements, used in filtering procedures, it is possible to use low-dimensional representations of visual features , obtained as a solution of nonlinear multi-output regression on unknown input manifold problem . Such estimator has the form
where is some gain function.
For choosing the optimal gain functions in above Kalman filtering procedures, it is necessary to know covariance matrices for deviations between observations and their expected values, as well as between the expected robot localizations (9) and its real pose . Corresponding covariance matrices can be estimated from samples (3) and (5) in which robot localizations are known with high accuracy.
Iv Robot Localization: Solution
Preprocessing step is a standard one and usage of Kalman filtering procedures in robot localization problem were described in Sections III-A and III-B, respectively. In this section we describe shortly the estimating procedures for Robot localization and Feature modeling functions based on the solution of nonlinear regression problem with high-dimensional manifold valued inputs.
Iv-a Robot Localization Manifold Estimation Problem
Consider an unknown smooth manifold called Regression manifold (RM)
with the intrinsic dimension , which is embedded in an ambient -dimensional Euclidean space and parameterized by an unknown chart
defined on the Localization space . The RM can be considered as a direct product of the VFS and the LS .
be Jacobian matrix of the mapping (11) which is split into Jacobian matrix of the mapping and being a unit matrix. The Jacobian (12) determines a three-dimensional linear space in which is a tangent space to the RM at the point ; hereinafter, is a linear space spanned by columns of an arbitrary matrix .
The set consists of points of the RM , equipped by tangent spaces at these points, is known in the Manifold theory  as the Tangent Bundle of the RM .
Iv-B Robot Localization Manifold Estimation: GSE Solution
Using Grassmann & Stiefel Eigenmaps (GSE) method [24, 25], applied to the sample (13), we construct the solution to the Tangent bundle manifold learning problem, resulting in the following quantities:
sample-based area which is close to the unknown RM ,
embedding mapping from the area to the Representation Space (RS)
recovery mapping from the RS to ,
matrix defined on the RS ,
which together provides both
between initial and recovered points and . Thanks to (14) we get small Hausdorff distance between the RM and the three-dimensional recovered regression manifold (RRM)
embedded in the ambient -dimensional Euclidean space;
in which is a Jacobian matrix of the mapping . Thanks to (16) we get proximity between the tangent space to the RM at the point and the tangent space to the RRM (15) at the nearby recovered point . The proximity between these tangent spaces, considered as elements of the Grassmann manifold, is defined using chosen metric on the Grassmann manifold.
Therefore, the tangent bundle
of the RRM accurately approximates the tangent bundle .
A splitting of an arbitrary vector into two vectors and implies the corresponding partitions
of the mapping and the matrix .
Using the representation (11), the embedding mapping , defined on the RM , can be written as a function
defined on the LS .
Using the mapping , the RM and the embedding mapping can be written as
respectively, where the functions
and (20) are defined on the VFS .
The and Jacobian matrices (the covariant differentiation is used here) and of the mappings and can be estimated  by the matrices
Although the GSE-based functions (23) and (24) accurately approximate the sought-for functions and , respectively, they cannot be considered as the solution to the Robot Localization problem because the mappings and (17), (18) depend on the argument
whose values are known only at sample points:
Based on known values (26) of the functions (20) and (19) at sample points, as well as on the known values of their Jacobian matrices (21), (22) at these points, the estimators and of these functions at arbitrary points and , respectively, are constructed using the Jacobian Regression method, proposed in .
GSE solution applied to the RM includes construction of the kernels and on the VFS and the LS , respectively. These kernels reflect not only geometrical closeness between points and but also closeness between the tangent spaces and to the RM .
Using these kernels, the Jacobian Regression method gives the estimators and in an explicit form defined by formulas
respectively, where and .
Iv-C Robot Localization: Final Formulas
for the Localization function , and the final estimator
for the Feature modeling function .
Note that most of known appearance-based learning methods solve only the Localization function estimation problem.
V Numerical experiments
We used vehicle moving in a city “Multi-FoV” synthetic dataset  and considered it as a testing problem for robot localization methods. The data set consists of color images of size and information about the vehicle position for each of them. Measurements from three different optical vision systems (perspective, fisheye and catadioptric) are presented in the dataset.
Firstly, we estimated the intrinsic dimensionality of the data using three different popular approaches : global via IsoMap , local via correlation dimension  and pointwise via maximum likelihood method from . The results of estimation are presented in Table I. The true dataset intrinsic dimension is equal to one by design as a dimensionality of a well-sampled continuous trajectory of the vehicle. The overestimated dimensionality by local and pointwise approaches could be caused by extremely large data dimensionality ().
We also applied GSE-approach for localization vs kernel nonparametric regression (KNR). Random subsample with of data points was used as a training set, and the rest of data points was used as a testing set. Orientation was not considered. The optimal kernel bandwidth for the KNR was estimated globally and the same for all coordinates (color channels of pixels) using leave-one-out cross-validation. The relative root means square error (RRMSE) was used as an error measure. Results are presented in Table II. One can see that GSE provides better results. One of explanations is that GSE uses adaptive kernel bandwidth in the original space and takes into account first order effects.
The GSE-algorithm not only reduces dimensionality but also constructs reconstruction mapping (Figure 2) and estimates differential structure such as tangent spaces (Figure 3) which could be useful for velocity estimation. One can see that reduced-reconstructed images are almost the same as original ones but a bit blurred (the norm of difference of the original image Frobenius norm) and the tangent space represents borders of moving objects.
We consider an appearance-based robot self-localization problem. Using recent manifold learning and deep learning techniques, we propose a new geometrically motivated solution based on training data consisting of a finite set of images captured in known locations of the robot. Numerical experiments demonstrated efficiency of the proposed approach. Further full-scale experiments under different external conditions are underway.
E. Burnaev was supported by the RFBR grants 16-01-00576 A and 16-29-09649 ofi_m. A. Bernstein and Y. Yanovich were supported by the Russian Science Foundation grant (project 14-50-00150).
Talluri, R., Aggarwal, J.K.: Position estimation techniques for an autonomous mobile robot — A review. In: Handbook of Pattern Recognition and Computer Vision, Chen, C.H., Pau, L.F., Wang, P.S.P.(Eds.), Ch. 4.4, pp. 769–801. Singapore: World Scientific (1993)
-  Borenstein, J.H., Everett, R., Feng, L., Wehe, D.: Mobile robot positioning: Sensors and techniques. J. Robot. Syst. 14, 231–249 (1997)
-  J. Simanek: Data fusion for localization using state estimation and machine learning. PhD thesis, Czech Technical University, Prague (2015)
-  Bonin-Font, F., Ortiz, A., Oliver, G.: Visual navigation for mobile robots: A survey. Journal of Intelligent and Robotic Systems 53(3), 263–296 (2008)
-  Candy, J.V.: Model-based signal processing. John Wiley & Sons, Inc. (2006)
-  Olson, C.F.: Probabilistic self-localization for mobile robots. IEEE Transactions on Robotics and Automation 16(1), 55–66 (2000)
-  Kröse, B.J.A., Vlassis, N., Bunschoten, R.: Omnidirectional Vision for Appearance-Based Robot Localization. Lecture Notes in Computer Science, v. 2238 “Sensor Based Intelligent Robots”, Springer, Berlin Heidelberg, 39–50 (2002)
-  Saito, M., Kitaguchi, K.: Appearance based robot localization using regression models. In: Proceedings of 4th IFAC-Symposium on Mechatronic Systems, v. 2, pp. 584–589 (2006)
-  Hamm, J., Lin, Y., Lee, D.D.: Learning nonlinear appearance manifolds for robot localization. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2005), pp. 1239–1244 (2005)
-  Crowley, J.L., Pourraz, F.: Continuity Properties of the Appearance Manifold for Mobile Robot Position Estimation. Image and Vision Computing 19(11), 741-752 (2001)
-  Pauli, J.: Learning-Based Robot Vision. Lecture Notes in Computer Science, vol. 2048 “Principles and Applications”, Springer, Heidelberg, 292 pp. (2001)
A.P. Kuleshov, A.V. Bernstein, E.V. Burnaev. Mobile Robot Localization via Machine Learning. Lecture Notes in Artificial Intelligence, vol. 10358 “Machine Learning and Data Mining in Pattern Recognition”, 15 pp., Springer International Publishing AG, Verlag, (2017)
-  Vlassis, N., Krose, B.J.A.: Robot environment modeling via principal component regression. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’99), pp. 677–682 (1999)
-  Se, S., Lowe, D., Little, J.: Local and global localization for mobile robots using visual landmarks. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2001), pp. 414–420 (2001)
-  Wang, K., Wang, W., Zhuang, Y.: Appearance-Based Map Learning for Mobile Robot by Using Generalized Regression Neural Network. Lecture Notes in Computer Science, vol. 4491 “Advances in Neural Networks”, Springer-Verlag, Berlin, Heidelberg, 834–842 (2007)
-  Crowley, J.L., Wallner, F., Schiele, B.: Position estimation using principal components of range data. In: Proceedings of the 1998 IEEE International Conference on Robotics and Automation, vol. 4, pp. 3121–3128 (1998)
Do, H.N., Choi, J., Lim, C.Y., Maiti, T.: Appearance-based localization using Group LASSO regression with an indoor experiment. In: Proceedings of the 2015 IEEE International Conference on Advanced Intelligent Mechatronics (AIM 2015), pp. 984–989 (2015)
-  Do, H.N., Choi, J.: Appearance-based outdoor localization using group lasso regression. In: Proceedings of the ASME Dynamic Systems and Control Conference (DSCC 2015), vol. 3, 8 pp. (2015)
-  Tibshirani, R.: Regression shrinkage and selection via the lasso: a retrospective. Journal of the Royal Statistical Society. Series B (Methodological), 73(3), 273–282 (2011)
-  Burnaev, E., Belyaev, M., Kapushev, E.: Computationally efficient algorithm for Gaussian Processes based regression in case of structured samples. Computational Mathematics and Mathematical Physics. 56(4), 499–513 (2016)
-  Burnaev, E., Panov, M., Zaytsev, A.: Regression on the Basis of Nonstationary Gaussian Processes with Bayesian Regularization. Journal of Communications Technology and Electronics. 61(6), 661–671 (2016)
-  Burnaev, E., Zaytsev, A. Surrogate modeling of mutlifidelity data for large samples. Journal of Communications Technology and Electronics. 60(12), 1348–1355 (2016)
-  Kuleshov, A.P., Bernstein, A.V.: Nonlinear multi-output regression on unknown input manifold. Annals of Mathematics and Artificial Intelligence, 32 pp (2017)
-  Bernstein, A.V., Kuleshov, A.P.: Manifold Learning: generalizing ability and tangent proximity. Int. J. Softw. Inf. 7(3), 359–390 (2013)
-  Bernstein, A.V., Kuleshov, A.P.: Dimensionality Reduction in Statistical Learning, IEEE Conference Proceedings of the 13th International Conference on Machine Learning and Applications (ICMLA-2014), IEEE Computer Society, pp. 330–335 (2014)
-  Kuleshov, A.P., Bernstein, A.V., Yanovich, Y.A.: Statistical Learning via Manifold Learning. Proceedings of the 14th International IEEE Conference on Machine Learning and Applications (ICMLA-2015), IEEE Computer Society, Conference Publishing Services, Los Alamitos, CA, USA, pp. 64-69 (2015)
-  Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks?. Advances in Neural Information Processing Systems 27, 3320–3328 (2014)
-  Lee, John M.: Introduction to Smooth Manifolds. New York: Springer-Verlag (2003)
-  Bernstein, A.V., Kuleshov, A.P.: Tangent Bundle Manifold Learning via Grassmann & Stiefel Eigenmaps, arXiv:1212.6031v1[cs.LG], pp. 1–25 (2012)
-  Kuleshov, A.P., Bernstein, A.V.: Incremental Construction of Low-dimensional Data Representations. Lecture Notes in Artificial Intelligence, vol. 9896 “Artificial Neural Networks for Pattern Recognition,” pp. 55–67. Springer Heidelberg (2016)
-  Golub, G.H., Van Loan, C.F.: Matrix Computation, 3rd edn. Johns Hopkins University Press, Baltimore, MD (1996)
-  Kuleshov, A.P., Bernstein, A.V.: Regression on High-dimensional Inputs. In: Workshops Proceedings volume of the IEEE International Conference on Data Mining (ICDM 2016), pp. 732–739. USA, IEEE Computer Society (2016)
-  Zhang, Z., Rebecq, H., Forster, C., Scaramuzza, D.: Benefit of Large Field-of-View Cameras for Visual Odometry. IEEE International Conference on Robotics and Automation (ICRA). 1–8 (2016)
-  Camastra, F., Staiano, A.: Intrinsic Dimension Estimation: Advances and Open Problems. Information Sciences. 328. 26-41 (2016)
-  Levina, E., Bickel, P.J.: Maximum Likelihood Estimation of Intrinsic Dimension. Advances in Neural Information Processing Systems. 777–784 (2005)
-  Tenenbaum, J. B., de Silva, V.,Langford, JC.: A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science 5500 (290), 2319–2323 (2000)
-  Granata, D., Carnevale, V.: Accurate Estimation of the Intrinsic Dimension Using Graph Distances: Unraveling the Geometric Complexity of Datasets. Sci Rep 31377 (6), 1–12 (2016)