I Introduction
Machine learning is an essential and ubiquitous framework for solving a wide range of tasks in different application areas. One of these tasks is the problem of estimating position (localization) of a mobile robot moving in an uncertain environment, which is necessary for understanding the environment to make navigational decisions. Usually to solve selflocalization problem for navigation of autonomous robots a built in visual sensor system (camera) is utilized.
The most common and basic method for performing localization is through deadreckoning using data received from odometer sensors. This technique integrates a history of sensor readings and executed actions (e.g., velocity history) of the robot over time to determine changes in positions from its starting location [1, 2, 3]. It is common to combine the additional localization technique based on visual data such as images or range profiles (see review [4]) with dead reckoning applying the extended Kalman filter to probabilistically update the robot position [5, 6]. In passive visionbased localization, position of an autonomous mobile robot equipped by a visual system (e.g., omnidirectional imaging system [7, 8, 9] or camera with steerable orientation [10]) can be estimated from images captured by the visual system. Continuous set of possible images, which can be captured by the visual system under all possible image formation parameters (relative position and orientation of the robot moving in a certain workspace, as well as camera intrinsic parameters and illumination function) is called Appearance Space (AS).
In this paper, we consider the most popular case when the AS is parameterized by robot localization consisting only of its position and orientation. Assuming that captured images allow distinguishing and recognizing localizations from which they have been taken, the solution to the considered appearancebased (passive visionbased) robot localization problem is a mapping from the AS to the Localization Space (LS), formed by all possible robot localizations.
We consider the appearancebased localization problem in the machine learning framework: the appearancebased model, which describes the relation between observed images and robot locations, is constructed using a finite set of captured images taken in known positions. This model allows estimating an unknown robot localization from a newly acquired image. Such appearancebased learning framework has become very popular in the field of robot learning [11].
Various appearancebased models (aka maps), describing the underlying lowdimensional structure in the AS, are usually constructed from training positionsimages data using supervised learning techniques. Such models provide internal representation of the highdimensional AS by certain visual features (or landmarks) extracted from images, see a short review in
[12].In many appearancebased methods input images from the AS are considered holistically, in relation to other images, and natural visual features are computed by projecting images onto lowdimensional subspaces [9, 13, 14] usually via the Principal Component Analysis (PCA). Various types of regression between images or their lowdimensional features and robot coordinates are constructed including Gaussian process regression, random forest, etc.
[15, 16, 17, 18, 19, 20, 21, 22].This paper proposes new geometrically motivated machine learning approach to the appearancebased robot localization problem by combining a few advanced techniques.
At first a deep learning model extracts visual features, representing a learned mapping from the AS to an image Visual Feature Space (VFS) (image of the AS under this mapping). The VFS is a lowdimensional manifold (surface) in the ambient high dimensional space with intrinsic dimensionality equal to the dimensionality of the LS, which in turn is equal to three. Thus, the solution of the Robot localization problem reduces to the regression problem with manifold valued inputs, “living” in the VFS, and outputs, belonging to the LS.
Next the manifold learning procedure from [23] is used to construct regression with highdimensional manifold valued inputs.
After that inverse mapping from the LS to the VFS is constructed via special nonlinear dimensionality reduction procedure [24, 25, 26]. This mapping allows predicting visual features of an image that will be captured at the predicted localization of the mobile robot based on chosen navigational decisions. The latter allows solving the robot localization problem as the Kalman filtering problem.
The paper is organized as follows. Section II contains rigorous statement of the appearancebased robot localization problem. Proposed approach is described in Section III. Section IV provides details of this solution. Section V describes results of numerical experiments. Conclusions are given in section VI.
Ii Robot Localization: Rigorous Problem Statement
Let a mobile robot, equipped with a visual system (for example, an omnidirectional imaging system), moves on a 2Dworkspace. Its localization
is a threedimensional vector consisting of
Robot Position (robot coordinates in a workspace) and Robot Orientation (an angle) relative to the coordinate system in the workspace. Let us denote by a subset consisting of all possible robot localization parameters and called Localization space.Let an image, captured by the robot imaging system, consists of pixels and, thereby, is represented by an imagevector . We denote by an image, captured by the robot with the localization parameter , which is described by the Image modeling function with the domain of definition . Let us denote by
(1) 
the Appearance space consisting of all possible images, which can be captured by the mobile robot and parameterized by the robot localization parameter .
Assuming that images, captured by the robot in different localizations, are different, the Image modeling function is a onetoone mapping from the LS to the AS. Thus, the AS is a threedimensional manifold without selfintersections (Appearance manifold, AM), parameterized by the chart and embedded in the ambient dimensional space. Therefore, there exists an inverse mapping
(2) 
called Localization function from the AM to the LS .
The functions and , as well the AM , are unknown, and the Robot localization problem consists in constructing the robot localization from the image . We consider this problem in the machine learning framework. Let us denote by
(3) 
a training dataset consisting of images , captured by the robot visual system in known localizations when robot moves in the workspace randomly or on a regular grid. For example, the mobile robot, described in [7], captures omnidirectional images every centimeters along robot random paths; reference positions are located on a regular grid with cells of size either cm in a mm workspace [8] or m in a mm workspace [9], respectively.
We consider the robot localization problem as Localization function estimation problem: to recover an unknown Localization function at an arbitrary outofsample point , from its known values at known points .
Iii Robot Localization: Proposed Approach
Proposed approach consists of four successively executed steps:

preprocess captured images to get image visual features;

estimate robot localization from visual features of captured images;

predict visual features of an image that will be captured at a new given robot localization;

solve the robot localization problem via the Kalman filtering technique.
Details are given below.
Iiia Preprocessing Captured Images
Image data requires subjectmatter expertise to extract key features. Deep convolutional neural networks (DCNNs) extract features automatically from domainspecific images, without any feature engineering techniques. This process makes DCNNs suitable for extraction of visual features:

DCNN with many layers is trained using some extensive image database,

Usually initial layers learn loworder features (e.g. color, edges, etc.),

Final layers learn higher order representations (specific to input image) that are subsequently used for image classification and transfer learning, see
[27].
The constructed mapping, realized by final layers of DCNN, determines Visual Feature Space (VFS)
(4) 
consisting of visual features for all images from the AS.
As well the AS , the VFS is a threedimensional manifold parameterized by the chart and embedded in an ambient dimensional space. The mapping , called Feature modeling function, predicts features of an image that is captured at an arbitrary robot localization .
IiiB Nonlinear Regression on Highdimensional Inputs: Localization Function Estimation
Using the constructed mapping , initial Localization function estimation problem is reduced to the regression problem on the VFS : using dataset
(5) 
with known robot localizations and computed features from images , captured in these localizations, we estimate an unknown robot position from visual features of an image , captured at some new robot localization . The solution determines the soughtfor mapping .
Initial and reduced regression problems deal with high dimensional inputs, and standard regression methods perform poorly due the statistical and computational ‘curse of dimensionality’ phenomenon. However, by construction the input spaces in both problems are threedimensional manifolds parameterized by the charts
and and embedded in the ambient highdimensional spaces and , respectively. Taking into account this fact, in order to avoid the curse of dimensionality phenomena most of approaches use various dimensionality reduction techniques (usually, PCA) for constructing lowdimensional visual features by projecting captured images onto lowdimensional linear subspaces. The robot localization function is estimated from such visual features [11, 13, 14]. However, constructed lowdimensional linear subspaces may have a larger dimension than the real intrinsic dimension, which is equal to three.Our approach uses new solution to nonlinear multioutput regression problem with input belonging to unknown manifold [23]. The core of the solution is based on a special nonlinear dimensionality reduction method called Grassmann & Stiefel Eigenmaps [24, 25]; this method has already been successfully used to solve certain statistical learning problem [26].
The solution [23] results in

embedding mapping from the VFS to the threedimensional Representation Space (RS) consisting of representations for computed features ;

recovered mapping , which maps the lowdimensional representation of visual features to the robot localization .
Therefore, the mapping is estimated by the statistic and soughtfor Localization function is estimated by
(6) 
IiiC Nonlinear Regression with Highdimensional Inputs: Feature modeling function estimation
Consider the problem of Feature modeling function estimation from the dataset (5). This regression problem can be described as follows: estimate an unknown mapping from the LS to the VFS using known visual features of images captured at known points . Here we have a regression problem with highdimensional manifold valued outputs belonging to the dimensional manifold (4).
The solution [23] also realizes recovery mapping from the lowdimensional representation to its preimage (visual features ). This mapping is ‘approximately inverse’ to the embedding mapping and provides proximity
(7) 
This quantity allows estimating the Feature modeling function by the formula
(8) 
IiiD Kalman Filtering in Robot Localization Problem
The constructed estimator (8), which predicts visual features of an image , captured at the localization , can be used in a Kalman filtering procedure [5] for robot localization.
Robot navigation consists in choosing the control
at given time moments
. Let be a current robot position at time , then, under some chosen control , the robot must move to the expected position(9) 
where is a known function defined by a solution of a navigation motion control problem.
In practice, the estimated position of the robot at time is only known, which differs from the exact position ; the exact position at time also differs from the expected position .
Let a robot visual sensing system provides a captured image at the moment . We want to solve a filtering problem to improve the predicted localization from the captured image .
The constructed estimator (8) allows predicting for visual features of the captured image , and the standard Kalman filter [5] constructs the improved localization as
where is a Kalman gain.
Using the estimator (6) for the Localization function (2), we can use a quantity , representing visual features, as an estimator of the robot pose in which the image has been taken, and construct the estimator as
where is another gain function.
As measurements, used in filtering procedures, it is possible to use lowdimensional representations of visual features , obtained as a solution of nonlinear multioutput regression on unknown input manifold problem [23]. Such estimator has the form
where is some gain function.
For choosing the optimal gain functions in above Kalman filtering procedures, it is necessary to know covariance matrices for deviations between observations and their expected values, as well as between the expected robot localizations (9) and its real pose . Corresponding covariance matrices can be estimated from samples (3) and (5) in which robot localizations are known with high accuracy.
Iv Robot Localization: Solution
Preprocessing step is a standard one and usage of Kalman filtering procedures in robot localization problem were described in Sections IIIA and IIIB, respectively. In this section we describe shortly the estimating procedures for Robot localization and Feature modeling functions based on the solution of nonlinear regression problem with highdimensional manifold valued inputs.
Iva Robot Localization Manifold Estimation Problem
Consider an unknown smooth manifold called Regression manifold (RM)
(10) 
with the intrinsic dimension , which is embedded in an ambient dimensional Euclidean space and parameterized by an unknown chart
(11) 
defined on the Localization space . The RM can be considered as a direct product of the VFS and the LS .
Let
(12) 
be Jacobian matrix of the mapping (11) which is split into Jacobian matrix of the mapping and being a unit matrix. The Jacobian (12) determines a threedimensional linear space in which is a tangent space to the RM at the point ; hereinafter, is a linear space spanned by columns of an arbitrary matrix .
The set consists of points of the RM , equipped by tangent spaces at these points, is known in the Manifold theory [28] as the Tangent Bundle of the RM .
IvB Robot Localization Manifold Estimation: GSE Solution
Using Grassmann & Stiefel Eigenmaps (GSE) method [24, 25], applied to the sample (13), we construct the solution to the Tangent bundle manifold learning problem, resulting in the following quantities:

samplebased area which is close to the unknown RM ,

embedding mapping from the area to the Representation Space (RS)

recovery mapping from the RS to ,

matrix defined on the RS ,
which together provides both

proximity
(14) between initial and recovered points and . Thanks to (14) we get small Hausdorff distance between the RM and the threedimensional recovered regression manifold (RRM)
(15) embedded in the ambient dimensional Euclidean space;

proximity
(16) in which is a Jacobian matrix of the mapping . Thanks to (16) we get proximity between the tangent space to the RM at the point and the tangent space to the RRM (15) at the nearby recovered point . The proximity between these tangent spaces, considered as elements of the Grassmann manifold, is defined using chosen metric on the Grassmann manifold.
Therefore, the tangent bundle
of the RRM accurately approximates the tangent bundle .
Note also that the original GSE algorithm [24, 25] has computational complexity for a sample of size ; the incremental version of the GSE [30] has significantly smaller running time .
A splitting of an arbitrary vector into two vectors and implies the corresponding partitions
(17)  
(18) 
of the mapping and the matrix .
Using the representation (11), the embedding mapping , defined on the RM , can be written as a function
(19) 
defined on the LS .
Using the mapping , the RM and the embedding mapping can be written as
and
(20) 
respectively, where the functions
and (20) are defined on the VFS .
The and Jacobian matrices (the covariant differentiation is used here) and of the mappings and can be estimated [20] by the matrices
(21)  
(22) 
respectively. Here denotes a pseudoinverse MoorePenrose matrix [31] of an arbitrary matrix and is a certain estimator [23] of an projection matrix onto the tangent space of the VFS at the point .
Although the GSEbased functions (23) and (24) accurately approximate the soughtfor functions and , respectively, they cannot be considered as the solution to the Robot Localization problem because the mappings and (17), (18) depend on the argument
(25) 
whose values are known only at sample points:
(26) 
Based on known values (26) of the functions (20) and (19) at sample points, as well as on the known values of their Jacobian matrices (21), (22) at these points, the estimators and of these functions at arbitrary points and , respectively, are constructed using the Jacobian Regression method, proposed in [32].
GSE solution applied to the RM includes construction of the kernels and on the VFS and the LS , respectively. These kernels reflect not only geometrical closeness between points and but also closeness between the tangent spaces and to the RM .
Using these kernels, the Jacobian Regression method gives the estimators and in an explicit form defined by formulas
and
respectively, where and .
IvC Robot Localization: Final Formulas
Let us denote and . Substitution of estimators and in formulas (23) and (24) instead of and , provides the final estimator
for the Localization function , and the final estimator
for the Feature modeling function .
Note that most of known appearancebased learning methods solve only the Localization function estimation problem.
V Numerical experiments
We used vehicle moving in a city “MultiFoV” synthetic dataset [33] and considered it as a testing problem for robot localization methods. The data set consists of color images of size and information about the vehicle position for each of them. Measurements from three different optical vision systems (perspective, fisheye and catadioptric) are presented in the dataset.
Firstly, we estimated the intrinsic dimensionality of the data using three different popular approaches [34]: global via IsoMap [36], local via correlation dimension [37] and pointwise via maximum likelihood method from [35]. The results of estimation are presented in Table I. The true dataset intrinsic dimension is equal to one by design as a dimensionality of a wellsampled continuous trajectory of the vehicle. The overestimated dimensionality by local and pointwise approaches could be caused by extremely large data dimensionality ().
vision systemmethod  global  local  pointwise 

perspective  1  11  3 
fisheye  1  7  3 
catadioptric  1  11  3 
We also applied GSEapproach for localization vs kernel nonparametric regression (KNR). Random subsample with of data points was used as a training set, and the rest of data points was used as a testing set. Orientation was not considered. The optimal kernel bandwidth for the KNR was estimated globally and the same for all coordinates (color channels of pixels) using leaveoneout crossvalidation. The relative root means square error (RRMSE) was used as an error measure. Results are presented in Table II. One can see that GSE provides better results. One of explanations is that GSE uses adaptive kernel bandwidth in the original space and takes into account first order effects.
vision systemmethod  GSE  KNR 

perspective  0.045  0.063 
fisheye  0.037  0.059 
catadioptric  0.039  0.058 
The GSEalgorithm not only reduces dimensionality but also constructs reconstruction mapping (Figure 2) and estimates differential structure such as tangent spaces (Figure 3) which could be useful for velocity estimation. One can see that reducedreconstructed images are almost the same as original ones but a bit blurred (the norm of difference of the original image Frobenius norm) and the tangent space represents borders of moving objects.
Vi Conclusions
We consider an appearancebased robot selflocalization problem. Using recent manifold learning and deep learning techniques, we propose a new geometrically motivated solution based on training data consisting of a finite set of images captured in known locations of the robot. Numerical experiments demonstrated efficiency of the proposed approach. Further fullscale experiments under different external conditions are underway.
Acknowledgments.
E. Burnaev was supported by the RFBR grants 160100576 A and 162909649 ofi_m. A. Bernstein and Y. Yanovich were supported by the Russian Science Foundation grant (project 145000150).
References

[1]
Talluri, R., Aggarwal, J.K.: Position estimation techniques for an autonomous mobile robot — A review. In: Handbook of Pattern Recognition and Computer Vision, Chen, C.H., Pau, L.F., Wang, P.S.P.(Eds.), Ch. 4.4, pp. 769–801. Singapore: World Scientific (1993)
 [2] Borenstein, J.H., Everett, R., Feng, L., Wehe, D.: Mobile robot positioning: Sensors and techniques. J. Robot. Syst. 14, 231–249 (1997)
 [3] J. Simanek: Data fusion for localization using state estimation and machine learning. PhD thesis, Czech Technical University, Prague (2015)
 [4] BoninFont, F., Ortiz, A., Oliver, G.: Visual navigation for mobile robots: A survey. Journal of Intelligent and Robotic Systems 53(3), 263–296 (2008)
 [5] Candy, J.V.: Modelbased signal processing. John Wiley & Sons, Inc. (2006)
 [6] Olson, C.F.: Probabilistic selflocalization for mobile robots. IEEE Transactions on Robotics and Automation 16(1), 55–66 (2000)
 [7] Kröse, B.J.A., Vlassis, N., Bunschoten, R.: Omnidirectional Vision for AppearanceBased Robot Localization. Lecture Notes in Computer Science, v. 2238 “Sensor Based Intelligent Robots”, Springer, Berlin Heidelberg, 39–50 (2002)
 [8] Saito, M., Kitaguchi, K.: Appearance based robot localization using regression models. In: Proceedings of 4th IFACSymposium on Mechatronic Systems, v. 2, pp. 584–589 (2006)
 [9] Hamm, J., Lin, Y., Lee, D.D.: Learning nonlinear appearance manifolds for robot localization. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2005), pp. 1239–1244 (2005)
 [10] Crowley, J.L., Pourraz, F.: Continuity Properties of the Appearance Manifold for Mobile Robot Position Estimation. Image and Vision Computing 19(11), 741752 (2001)
 [11] Pauli, J.: LearningBased Robot Vision. Lecture Notes in Computer Science, vol. 2048 “Principles and Applications”, Springer, Heidelberg, 292 pp. (2001)

[12]
A.P. Kuleshov, A.V. Bernstein, E.V. Burnaev. Mobile Robot Localization via Machine Learning. Lecture Notes in Artificial Intelligence, vol. 10358 “Machine Learning and Data Mining in Pattern Recognition”, 15 pp., Springer International Publishing AG, Verlag, (2017)
 [13] Vlassis, N., Krose, B.J.A.: Robot environment modeling via principal component regression. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’99), pp. 677–682 (1999)
 [14] Se, S., Lowe, D., Little, J.: Local and global localization for mobile robots using visual landmarks. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2001), pp. 414–420 (2001)
 [15] Wang, K., Wang, W., Zhuang, Y.: AppearanceBased Map Learning for Mobile Robot by Using Generalized Regression Neural Network. Lecture Notes in Computer Science, vol. 4491 “Advances in Neural Networks”, SpringerVerlag, Berlin, Heidelberg, 834–842 (2007)
 [16] Crowley, J.L., Wallner, F., Schiele, B.: Position estimation using principal components of range data. In: Proceedings of the 1998 IEEE International Conference on Robotics and Automation, vol. 4, pp. 3121–3128 (1998)

[17]
Do, H.N., Choi, J., Lim, C.Y., Maiti, T.: Appearancebased localization using Group LASSO regression with an indoor experiment. In: Proceedings of the 2015 IEEE International Conference on Advanced Intelligent Mechatronics (AIM 2015), pp. 984–989 (2015)
 [18] Do, H.N., Choi, J.: Appearancebased outdoor localization using group lasso regression. In: Proceedings of the ASME Dynamic Systems and Control Conference (DSCC 2015), vol. 3, 8 pp. (2015)
 [19] Tibshirani, R.: Regression shrinkage and selection via the lasso: a retrospective. Journal of the Royal Statistical Society. Series B (Methodological), 73(3), 273–282 (2011)
 [20] Burnaev, E., Belyaev, M., Kapushev, E.: Computationally efficient algorithm for Gaussian Processes based regression in case of structured samples. Computational Mathematics and Mathematical Physics. 56(4), 499–513 (2016)
 [21] Burnaev, E., Panov, M., Zaytsev, A.: Regression on the Basis of Nonstationary Gaussian Processes with Bayesian Regularization. Journal of Communications Technology and Electronics. 61(6), 661–671 (2016)
 [22] Burnaev, E., Zaytsev, A. Surrogate modeling of mutlifidelity data for large samples. Journal of Communications Technology and Electronics. 60(12), 1348–1355 (2016)
 [23] Kuleshov, A.P., Bernstein, A.V.: Nonlinear multioutput regression on unknown input manifold. Annals of Mathematics and Artificial Intelligence, 32 pp (2017)
 [24] Bernstein, A.V., Kuleshov, A.P.: Manifold Learning: generalizing ability and tangent proximity. Int. J. Softw. Inf. 7(3), 359–390 (2013)
 [25] Bernstein, A.V., Kuleshov, A.P.: Dimensionality Reduction in Statistical Learning, IEEE Conference Proceedings of the 13th International Conference on Machine Learning and Applications (ICMLA2014), IEEE Computer Society, pp. 330–335 (2014)
 [26] Kuleshov, A.P., Bernstein, A.V., Yanovich, Y.A.: Statistical Learning via Manifold Learning. Proceedings of the 14th International IEEE Conference on Machine Learning and Applications (ICMLA2015), IEEE Computer Society, Conference Publishing Services, Los Alamitos, CA, USA, pp. 6469 (2015)
 [27] Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks?. Advances in Neural Information Processing Systems 27, 3320–3328 (2014)
 [28] Lee, John M.: Introduction to Smooth Manifolds. New York: SpringerVerlag (2003)
 [29] Bernstein, A.V., Kuleshov, A.P.: Tangent Bundle Manifold Learning via Grassmann & Stiefel Eigenmaps, arXiv:1212.6031v1[cs.LG], pp. 1–25 (2012)
 [30] Kuleshov, A.P., Bernstein, A.V.: Incremental Construction of Lowdimensional Data Representations. Lecture Notes in Artificial Intelligence, vol. 9896 “Artificial Neural Networks for Pattern Recognition,” pp. 55–67. Springer Heidelberg (2016)
 [31] Golub, G.H., Van Loan, C.F.: Matrix Computation, 3rd edn. Johns Hopkins University Press, Baltimore, MD (1996)
 [32] Kuleshov, A.P., Bernstein, A.V.: Regression on Highdimensional Inputs. In: Workshops Proceedings volume of the IEEE International Conference on Data Mining (ICDM 2016), pp. 732–739. USA, IEEE Computer Society (2016)
 [33] Zhang, Z., Rebecq, H., Forster, C., Scaramuzza, D.: Benefit of Large FieldofView Cameras for Visual Odometry. IEEE International Conference on Robotics and Automation (ICRA). 1–8 (2016)
 [34] Camastra, F., Staiano, A.: Intrinsic Dimension Estimation: Advances and Open Problems. Information Sciences. 328. 2641 (2016)
 [35] Levina, E., Bickel, P.J.: Maximum Likelihood Estimation of Intrinsic Dimension. Advances in Neural Information Processing Systems. 777–784 (2005)
 [36] Tenenbaum, J. B., de Silva, V.,Langford, JC.: A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science 5500 (290), 2319–2323 (2000)
 [37] Granata, D., Carnevale, V.: Accurate Estimation of the Intrinsic Dimension Using Graph Distances: Unraveling the Geometric Complexity of Datasets. Sci Rep 31377 (6), 1–12 (2016)
Comments
There are no comments yet.