I Introduction
Visualinertial SLAM (VISLAM) and visualinertial odometry (VIO) approaches have received great attention from the researchers in recent years. They can be applied on mobile devices, micro aerial vehicles (MAVs), ground robots and passenger cars for localization and perception. Filteringbased methods such as MSCKF [13] and optimizationbased methods such as OKVIS [8]
make up the two categories of state estimation methods. In general, filterbased approaches have better efficiency while optimizationbased methods enjoy higher accuracy
[5]. The optimizationbased approaches [8],[14] typically optimize a limited number of current states to limit the amount of computation, and use marginalization to make use of previous information to better estimate the current states.For wheeled platforms such as robots and passenger cars, the accuracy of visualinertial navigation can be dramatically improved with the aid of wheel encoders [17], [11], [20], [21], [7], [2], [12], [16]. In some methods such as [17], [9], [20],[21],[7], [2], [12] and [22], the IMU measurements and wheel encoder readings are preintegrated individually. This type of approaches either need at least two wheel encoders or require the front wheel angle measurement for preintegration, and have to deal with the problem that the angular velocity provided by the sensors on wheels is always within the ground plane. Other methods such as [16], [19] and [11] jointly preintegrates the angular velocity from IMU and the linear velocity from wheel encoder. This type of approaches can still work when only one wheel encoder is available, and the uneven terrain does not have an impact on the performance of these approaches theoretically. In this paper we use wheel encoder and odometer to denote the same thing.
However, for applications on ground vehicles, VISLAM and VIO approaches often suffer from degenerate cases, even with the aid of wheel encoders. [17] points out two degenerate cases under special motions. Firstly, the scale is unobservable when the platform moves with constant local linear acceleration. Secondly, the roll and pitch angles are unobservable when the platform has no rotational motion. Both the cases are related to the accelerometer bias which can not be correctly estimated under the above special motions. [17] also proves that the first degenerated case can be eliminated with the use of wheel encoder. However, it can be drawn that the second degenerate case still exists in such a case through derivation, which is straightforward and will be presented briefly in the Appendix of this paper. Besides, the experimental results in [11] also indicate that the accelerometer bias can not be correctly estimated until the first turning with the use of wheel encoder. In addition to the accelerometer bias, some of the extrinsic parameters can not be correctly estimated as well, when the platform has no rotational motion. [18] have proved that the translational component of cameraIMU extrinsic parameters is unobservable in a VIO system when the platform undergoes pure translation. In Section III of this paper, we will give an observability analysis for the seven unobservable directions caused by the special motion pattern in extrinsic parameters for an odometeraided VISLAM system, under the circumstance that the platform undergoes pure translation along a straight line. Thanks to the employment of marginalization, the optimizationbased VISLAM approaches can make use of previous information collected since beginning. Therefore, once the platform performs rotational motion such as making a turn, the accelerometer bias and extrinsic parameters will be correctly estimated from then on. Nevertheless, before the first turning, the inaccuracy which results from the incorrectly estimated accelerometer bias and extrinsic parameters remains a problem for odometeraided VISLAM approaches.
To relieve the inaccuracy before the first turning, [11] proposes to keep the extrinsic parameters constant during nonlinear optimization until the platform has made a turn and the estimation of accelerometer bias has reached convergence. Furthermore, we may further keep the accelerometer bias constant as zero before the first turning to relieve the inaccuracy caused by the incorrectly estimated accelerometer bias. However, since the accelerometer bias is actually not zero and the extrinsic parameters may not be very accurate, the accuracy of trajectory before the first turning may still be deteriorated, especially for outdoor scenes where the vehicle is likely to travel a long distance before the first turning.
By contrast, in this paper we propose a bidirectional trajectory computation approach to make the estimated poses before the first turning as accurate as those after the first turning. In short, after the first turning, we additionally create a backward computation thread to recalculate the poses from the first turning back to the starting point. In this way, the accuracy of estimated poses before the first turning do not suffer from the lack of rotation anymore, because the backward computation makes use of the information obtained in the first turning. Also note that by means of bidirectional trajectory computation, we can obtain more accurate overall trajectory in real time, because every time one of the poses before the first turning is updated by the backward computation thread, the realtime trajectory is also adjusted accordingly. In the following, we provide the observability analysis for the extrinsic parameters in Section III and describe the proposed bidirectional trajectory computation method in Section IV.
Ii Preliminaries on odometeraided visualinertial SLAM
The proposed bidirectional trajectory computation method is based on the odometeraided VISLAM approach [11], which is a tightlycoupled approach based on sliding window optimization, where IMU and wheel encoder measurements are fused at the preintegration stage.
Iia Frames and Notations
The coordinate frames of the sensors include the camera frame, the IMU frame and the odometer frame. The wheel encoder is installed on one rear wheel that always points forward. For the details of these frames the reader may refer to [11]. We use to denote the world frame that is fixed since initialization, and , and to denote the camera frame, IMU frame, and odometer frame corresponding to the image. Let
denote the rotation matrix that takes a vector in frame
to frame , and is its quaternion form. is the coordinate of the origin point of frame in frame , and is the velocity of the origin point of frame measured in frame . And let and denote the accelerometer bias and gyroscope bias corresponding to image respectively. Moreover, we useto denote the skew symmetric matrix corresponding to a vector.
IiB State Estimation
The parameters to be estimated can be written as
(1) 
where is the inverse depth of one landmark in camera frame, and are the cameraIMU extrinsic parameters, while and are the IMUodometer extrinsic parameters. is the number of landmarks and is the size of sliding window.
The cost function mainly comprises reprojection error terms, IMUodometer error terms and the marginalization error term, which writes as
(2) 
where means the reprojection residual of landmark on image , and is the uniform information matrix for all reprojection error terms. is the set of images on which landmark appears. and are the residual vector and covariance matrix of the IMUodometer terms respectively, which are derived utilizing the IMUodometer preintegration results. is the marginalization error term. In practice we additionally add a very small term confining the roll angle in , which is always an unobservable angle because the velocity of the wheel always points forward in its own coordinate frame. This term is so small that it is neglected in the following demonstrations. The nonlinear optimization is performed using Dogleg method by Ceres Solver [1]. For the details of state estimation, the reader may refer to [11].
Iii Observability Analysis
In this section, we analyze the observability of extrinsic parameters when the platform moves along a straight line with no rotation, which is often the case for a car on a straight road before its first turning. For the observability analysis we need to consider the reprojection constraints
(3) 
as well as the IMUodometer constraints
(4)  
(5)  
(6)  
(7)  
(8)  
(9) 
where and are the observations of landmark on image and image respectively. is the inverse depth of landmark in the camera frame where its first observation happens. is the projection function, and is the inverse function of . is the size of sliding window, is the set of landmarks whose first observation happens on image , and is the set of images that can see landmark but are not the first image seeing . , , and are the nominal states coming from IMUodometer preintegration. is the time interval between image and image . And denotes the vector part of a quaternion. For more details of (3) to (9), the reader may refer to [11] and [14].
Among the extrinsic parameters and , is only involved in (7), which can be rewritten as
(10) 
When the platform moves along a straight line with no rotation, , in which case is not involved in any of the constraints, so it is unobservable.
, the eigenvalue ratio and the angular velocity around the Z axis (the vertical axis) at the first turning in each sequence of urban30, urban32, urban33 and urban34 from top to bottom. For each sequence the pinkish box indicates the first turning.
Similarly, and are only involved in (3), which can be rewritten as
(11) 
When the platform moves along a straight line with no rotation, is unobservable because . Moreover, in such a case, for every image pair , , and points in the same direction, to which we refer as the driving direction. Hence the component in rotation corresponding to the rotation around the driving direction, which is also called the roll angle in the following, is unobservable as well.
In practice, among the seven unobservable directions (three in , three in and one in ) when the platform moves along a straight line with no rotation, the roll angle in
is the direction whose observability is most relevant to whether the platform has made a turn. To make this point, we first conduct eigenvalue decomposition on the approximated Hessian matrix used in optimization of state estimation, which is the Jacobian matrix’s transpose multiplied by the Jacobian matrix. Then we compute the ratio of the eigenvalue, which corresponds to the eigenvector where the element corresponding to the roll angle in
is the largest in absolute value among all the eigenvectors, to the largest eigenvalue. The larger the eigenvalue ratio is, the better the roll angle in can be estimated. Figure 1 displays the impact of the first turning on the estimation error in roll angle of , as well as the eigenvalue ratio. Practically the estimation error in roll angle means the roll angle of , with being , where is the current estimated value, and is the real value of , which is obtained offline. The experiments in this section are conducted using the state estimation method illustrated in Section IIB, and the accelerometer bias is held constant and the extrinsic parameters start to be adjusted since the beginning, because our focus is on the observability of extrinsic parameters. In each sequence in Figure 1, the system starts at a dozen of seconds before the first turning, and the eigenvalue ratio is computed once every 10 optimizations. It is clear from Figure 1 that after the first turning the error in roll angle of dramatically decreases and that the eigenvalue ratio dramatically increases, both of which indicate that the roll angle in can be estimated much better after the first turning.Theoretically, state estimation is not concerned with the errors in the unobservable directions of the extrinsic parameters, if the platform moves along a perfectly straight line with no rotation. However, in the real world, the motion of the platform is not exactly along a straight line, hence the the errors in the unobservable directions of the extrinsic parameters do affect the accuracy of the trajectory. Table I shows the comparison of absolute trajectory error (ATE) for four sequences in KAIST Urban Data Set[6], either using the accurate extrinsic parameters calibrated offline or using the extrinsic parameters with added fixed error (5 degrees in the roll angle of ). In each of the four sequences the car moves along an approximately straight road, and the trajectories are computed using the state estimation method illustrated in Section IIB, with accelerometer bias and extrinsic parameters both held constant. From Table I we can infer that the inaccurate extrinsic parameters can affect the accuracy of trajectories before the first turning.
Sequence  urban22  urban23  urban24  urban25 

EPs accurate  8.8  11.1  15.0  8.0 
EPs with error  14.2  13.5  16.2  9.7 

Here EPs accurate means using the accurate extrinsic parameters calibrated offline, and EPs with error means using the extrinsic parameters with added fixed error. ATE, absolute trajectory error; EPs, extrinsic parameters
Iv Method
Taking into consideration that for the odometeraided VISLAM system described in Section IIB, the system is not stable and the extrinsic parameters can not be correctly estimated in the beginning, and that the accelerometer bias is unobservable until the platform makes a turn, we propose a robust method to acquire accurate realtime trajectory.
Iva Forward Computation Thread and Backward Computation Thread
In the very beginning, we propagate the poses and try to initialize our system. After the system is initialized as described in [11], the state estimation in sliding window is performed as in Section IIB in the main thread, which we call the forward computation thread. In the forward computation thread, before the first turning, the extrinsic parameters are held constant, and the accelerometer bias is set to zero and held constant, in order to make the system robust in the beginning. At this stage, we limit the magnitude of the marginalization term as described in Section IVC. Once the platform has made a turn larger than (45 degrees in our experiments) within a time interval (20 seconds in our experiments), the accelerometer bias starts to be adjusted, and as soon as the estimation of accelerometer bias reaches convergence according to the criterion adopted in [11], the extrinsic parameters starts to be adjusted. Thanks to the fact that the marginalization term contains historical information, especially the information gathered during the first turning, the accelerometer bias and extrinsic parameters that are engaged in state estimation will soon reach their desired values. After a time interval (30 seconds in our experiments) since the extrinsic parameters begin to be adjusted, we create a new thread named backward computation thread, meanwhile the forward computation thread keeps running. Both the two threads are independent with each other. Figure 2 is the schematic diagram illustrating forward computation and backward computation.
The backward computation thread also performs state estimation in a sliding window where the parameters are as (1) and the cost function writes as (2). When creating the backward computation thread, the values of the parameters in the backward computation thread except for landmark inverse depths are copied from the forward computation thread, and the IMUodometer terms and the marginalization term in the backward computation thread are identical to their counterparts in the forward computation thread. For the backward computation thread, the reprojection errors still take the form of (3), while the first observation in (3) means the observation happening on the image with the latest timestamp, instead of the one with the earliest timestamp as in the forward computation thread. Meanwhile, the inverse depth of each landmark is shifted as
(12) 
where and are the inverse depths of landmark after and before shifting respectively, , and are indexes of the earliest and latest image which can see the landmark in the sliding window respectively, and the meanings of the other symbols are the same as those in (3).
Parameters can be estimated correctly in the backward computation thread since the backward computation thread starts, because of the information contained in the marginalization error term. Figure 3(a)3(b) show the contrast between the sliding windows in forward and backward computation threads. In the following we illustrate how the backward computation thread operates according to Figure 3(b). Suppose that the first frame in data sequence is
, and that at a certain moment
, there are frames in the sliding window, namely . In the backward computation thread, every time the nonlinear optimization in the sliding window finishes at the certain moment , the next frame to be inputted is , which is the one previous to the frame whose timestamp is the earliest in the sliding window. The IMUodometer preintegration between the above two frames ( and ) is computed, and the initial value of the pose and velocity of the frame to be newly inputted () is propagated using IMU measurements between the two frames. Note that although the IMUodometer preintegration between the above two frames has been performed in the past in the forward computation thread, recomputation is needed because the estimated value of IMU biases and the extrinsic parameter have changed, and they are engaged in preintegration. Next, if the frame with the second earliest timestamp () is not a keyframe, it is discarded. Otherwise, the frame with the latest timestamp () is marginalized. The criterion to judge whether an image frame is a keyframe is the same as that in [14]. The backward computation terminates when the first frame in data sequence has been inputted into the sliding window. The IMU and wheel encoder measurements and the feature points used in backward computation are recorded previously during the forward computation.When the pose of a certain frame is estimated in the backward computation thread, it is used to substitute the corresponding pose previously estimated in the forward computation thread, because the poses estimated in the backward computation thread are more accurate.
IvB Computation of Realtime Trajectory
Every time the backward computation thread updates the pose of a certain frame, we compute a continuous realtime trajectory. Let denote the pose for some frame in the realtime trajectory, and let denote the frame that has just been updated by the backward computation thread. For the frames before frame , i.e. , the pose is just what was computed in Section IVA, and for the frames after frame , i.e. , the pose is computed as
(13) 
where and are the poses for frame before and after being updated by the backward computation thread respectively, and is the pose of frame computed in Section IVA.
IvC Bounded Marginalization Term
The marginalization residual takes the form of . and are computed in the marginalization process, and is the step to update the parameters . We have observed the phenomenon that the marginalization error keeps growing and thus the total error keeps growing before the first turning, when the accelerometer bias and extrinsic parameters are held constant. In order to reduce the accumulation of the error caused by inaccurate extrinsic parameters and accelerometer bias in the marginalization error term and prevent the above error from dominating the state estimation, before the first turning we multiply and by a ratio once the ratio of marginalization error to total error in (2) after optimization rises beyond a threshold . In our experiment is set to 0.85 and is set to 0.4.
V Experiments
We evaluate the effect of the bidirectional trajectory computation method proposed in this paper on KAIST Urban Data Set [6], which is a publicly available dataset containing data in complex urban scenes collected on a rear wheel drive passenger car. The sensors in the dataset include stereo cameras, one IMU and two wheel encoders mounted on two rear wheels. The frequencies of the captured images, IMU measurements and wheel encoder measurements are 10Hz, 100Hz and 100Hz respectively. The proposed approach is compared with the stereo inertial version of the stateoftheart VISLAM system VINSFusion[15], the standard VIO [10], and the odometeraided VISLAM approaches [20], [21] and [11]. The proposed approach and [11] use a monocular camera, one IMU and one wheel encoder. The approaches [20] and [21], as reported in their papers, use a monocular camera, one IMU and two wheel encoders. The stereo inertial version of VINSFusion uses stereo cameras and one IMU. All the experiments presented are performed on a PC with Intel Core i7 3.6GHz 6 core CPU and 64GB memory. The extrinsic parameters provided in the dataset are adopted as initial values, which may not be very accurate.
Va Average Positioning Error by Aligning the Starting Frame
The primary concern of our bidirectional trajectory computation method is to improve the accuracy at initial stage, which matters a lot supposing we only know the position and orientation of the vehicle at the starting point. In our first evaluation, we align the position and orientation of the starting image frames for the resulting trajectory from VISLAM approaches and the ground truth trajectory, and compute the average positioning error of every frame in the data sequence. The practice of aligning the starting frames is also adopted in the evaluation criteria on KITTI dataset [3]. Here our proposed approach is mainly compared with [11], which our approach is based on. The work [11] starts to optimize accelerometer bias from the beginning, and fix the extrinsic parameters until the platform has made a turn and the estimation of accelerometer bias has reached convergence, in order to reduce the instability in the very beginning. In order to make an exhaustive comparison on different strategies dealing with accelerometer bias and extrinsic parameters that are two instability factors, we derive some adapted versions from [11], that are: (i) both accelerometer bias and extrinsic parameters are fixed until the first turning (FAFE), (ii) the extrinsic parameters starts to be optimized from the beginning, and accelerometer bias is fixed until the first turning (FAOE), (iii) both accelerometer bias and extrinsic parameters starts to be optimized from the beginning (OAOE). Our proposed approach is firstly compared against [11] (OAFE) and its three adapted versions in the above, as well as VINSFusion[15]. To make a fair comparison, we select the image frame when the vehicle has traveled 100 meters as the starting frame, to avoid being affected by some erroneous pose estimations from some approaches in the very beginning. This comparison is made on all the 15 sequences with stereo cameras and with complexity level 3 (middle) or level 4 (high) in [6], namely urban25urban39. The comparison of average positioning error by aligning the starting frame is shown in Table II.
Sequence  Proposed  FAFE  FAOE  OAOE  OAFE[11]  VINSFusion[15]  Trajectory length 

urban25*  11.6  11.3  72.7  62.1  15.3  862.7  2.5km 
urban26  26.6  41.0  42.8  29.8  41.7  52.9  4.0km 
urban27  11.7  49.5  73.4  91.9  44.5  63.1  5.4km 
urban28  35.2  61.5  47.7  27.7  104.8  103.4  11.5km 
urban29  50.4  44.3  12.1  13.3  40.1  122.6  3.6km 
urban30  29.8  34.6  36.8  45.5  43.0  6.0km  
urban31  712.7  1107.6  995.9  1072.0  1229.5  1738.9  11.4km 
urban32  39.2  407.5  140.6  149.1  422.6  257.9  7.1km 
urban33  37.4  177.3  81.9  130.9  221.3  696.7  7.6km 
urban34  64.5  98.8  160.9  122.6  168.7  7.8km  
urban35*  58.6  49.3  290.9  253.4  57.5  3.2km  
urban36*  345.9  333.1  221.5  281.0  283.7  9.0km  
urban37*  421.6  371.0  1989.0  2220.8  677.0  1125.7  11.8km 
urban38  33.3  123.4  151.9  44.8  101.3  134.6  11.4km 
urban39  12.0  953.7  22.0  36.8  42.1  11.0km 

Here Proposed means the proposed approach in this paper and ’’ means failure. The sequences marked with ’*’ do not contain turnings, so the difference in accuracies on those sequences between the proposed approach and FAFE is only resulted by restricting the marginalization error as described in Section IV. FAFE, fixing accelerometer bias and fixing extrinsic parameters; FAOE, fixing accelerometer bias and optimizing extrinsic parameters; OAOE, optimizing accelerometer bias and optimizing extrinsic parameters; OAFE, optimizing accelerometer bias and fixing extrinsic parameters
Table II indicates that the proposed approach outperforms all the other five approaches on 9 out of the 15 sequences. Among the rest six sequences, urban25, urban35, urban36 and urban37 do not contain turnings, as a result the proposed bidirectional trajectory computation does not come in handy on these sequences in our proposed approach. The accuracy of the proposed approach on the above four sequences is generally higher than FAOE, OAOE and the stereo VISLAM[15], and comparable with OAFE[11], but slighterly lower than FAFE. That is because the manipulation described in Section IVC can cause information loss, given that when the trajectory does not contain turnings, the only difference between the proposed approach and FAFE lies in the utilization of the manipulation in Section IVC. However, in view of the good performance of the proposed approach on the other sequences with turnings where the bidirectional trajectory computation comes in handy, the benefit of the manipulation in Section IVC dramatically outweighs the cost.
Generally speaking, the accuracy of the proposed approach is higher than [11], as well as its adapted versions that deal with the accelerometer bias and extrinsic parameters differently.
VB Absolute Trajectory Error (ATE) Comparison
We also make an extensive comparison with more approaches, including the odometeraided VISLAM approaches [20], [21] and [11], the stereo VISLAM system VINSFusion [15], and the standard VIO [10]. The comparison is made in terms of absolute trajectory error (ATE), which is the rooted mean square error (RMSE) of the positions after a 6DoF trajectory alignment with the ground truth. The experiments are conducted on the sequences urban26, urban28, urban38 and urban39, because only the ATEs of these four sequences are reported in the paper [21]. The comparison results are shown in Table III.
Table III indicates that the proposed approach is clearly more accurate than other approaches in terms of ATE on those four sequences.
VC Evaluation of Effects on Estimating Accelerometer Bias and Extrinsic Parameter
We examine the effects on estimating accelerometer bias and extrinsic parameter using the proposed bidirectional trajectory computation approach, in order to reveal why this approach improves the accuracy. The proposed approach is compared with the approach OAOE in Section VA, which optimizes accelerometer bias and extrinsic parameters from the beginning and only performs forward computation. Figure 4 shows the comparison on estimated values of accelerometer bias and the estimation error in roll angle of between the above two approaches, at the first turning in each sequence of urban27, urban28, urban30 and urban34. As same as in Section III, here the system also starts at a dozen seconds before the first turning in each sequence instead of starting from the very beginning. Figure 4 indicates that the estimation error in roll angle of is much smaller using the proposed approach, and that the estimated value of accelerometer bias is more stable over time, which is more reasonable because the accelerometer bias is a slow timevarying quantity. Besides, both approaches can estimate the zcomponent of the accelerometer bias well. That is because both of the two unobservable directions of accelerometer bias before the first turning, which compose the orthogonal basis spanning over the 2D column space of , have very small values in their respective zcomponents, in consideration of the fact that is a rotation mainly around the Z axis for a ground vehicle and the gravity direction is exactly along the Z axis. The groundtruth value of is obtained offline, and the method to compute error in the roll angle is the same as that in Section III.
VD Computation of Realtime Trajectory
To illustrate the effect of computing the realtime trajectory, we take the sequence urban32 for example. Figure 5(a)5(d) displays the realtime trajectories after 0, 3, 6 and 9 minutes since backward computation starts respectively, compared with the ground truth trajectory. We can see that as backward computation proceeds, the realtime trajectory becomes closer and closer to the ground truth trajectory gradually. The performance on this sequence is shown in the supplementary video.
Sequence  Proposed  [11]  [21]  [20]  VINSFusion[15]  [10]  Trajectory length 

urban26  9.8  11.9  14.8  16.1  22.5  32.8  4.0km 
urban28  19.8  27.8  25.0  33.1  93.3  34.7  11.5km 
urban38  14.0  16.0  33.5  43.0  90.0  55.5  11.4km 
urban39  7.2  8.0  21.3  24.0  33.4  11.0km 
Vi Conclusion
In this paper, we propose a bidirectional trajectory computation method for VISLAM aided with wheel encoder. Firstly, we perform an observability analysis on the degenerate case that an odometeraided VISLAM system deployed on a car possibly encounters before the first turning. Secondly, we describe our proposed backward computation thread which refines the poses before the first turning, as well as the method to adjust the realtime trajectory. Experimental results show the higher accuracy of the whole trajectory, the correctly estimated parameters before the first turning, and the effects of realtime trajectory adjustment. Although in this paper wheel encoder is used, we also believe that the proposed bidirectional trajectory computation method can be applied on VISLAM systems that are not aided with wheel encoders as well.
Appendix
Hereafter we prove that the roll and pitch angles are unobservable when the platform has no rotational motion, even if the wheel encoders are used. For brevity, the proof is given by extending the derivations in [17]. According to [17], let denote the direction
(14) 
where we only consider the 3D position of a single landmark as what was actually done in [17]. And for any block row, (see (24) in [17]), of the observability matrix as (39) in [4], the work [17] has already obtained if the platform has no rotational motion, i.e.
(15) 
where we use to denote the IMU frame at any moment by slightly abusing the symbol. Therefore, under the circumstances, whether the roll and pitch angles are unobservable depends on whether , with being any of the extra block rows in the observability matrix provided by the wheel encoder measurements (see (38) in [17]). takes the form of
(16) 
where , and are as (46), (104) and (112) in [4] respecitvely. (There is a sign typo happening in (104) in [4]. Translating into the symbol system of this paper, the correct form of should be .) When (15) is satisfied, , , and , with being the time interval between image and image . Hence
(17) 
In such a case, considering and (17), we obtain
(18) 
Hence the roll and pitch angles are still unobservable when the platform has no rotational motion, even if the wheel encoders are used.
References
 [1] Sameer Agarwal, Keir Mierle, and Others. Ceres solver. http://ceressolver.org, 2017.
 [2] Zhiqiang Dang, Tianmiao Wang, and Fumin Pang. Tightlycoupled data fusion of vins and odometer based on wheel slip estimation. In 2018 IEEE International Conference on Robotics and Biomimetics (ROBIO), pages 1613–1619. IEEE, 2018.

[3]
Andreas Geiger, Philip Lenz, and Raquel Urtasun.
Are we ready for autonomous driving? the kitti vision benchmark
suite.
In
2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, pages 3354–3361. IEEE, 2012.  [4] Joel A Hesch, Dimitrios G Kottas, Sean L Bowman, and Stergios I Roumeliotis. Consistency analysis and improvement of visionaided inertial navigation. IEEE Transactions on Robotics, 30(1):158–176, 2014.
 [5] G. Huang. Visualinertial navigation: A concise review. In 2019 International Conference on Robotics and Automation (ICRA), pages 9572–9582. IEEE, May 2019.
 [6] Jinyong Jeong, Younggun Cho, YoungSik Shin, Hyunchul Roh, and Ayoung Kim. Complex urban dataset with multilevel sensors from highly diverse urban environments. The International Journal of Robotics Research, 38(6):642–657, 2019.
 [7] Rong Kang, Lu Xiong, Mingyu Xu, Junqiao Zhao, and Peizhi Zhang. Vinsvehicle: A tightlycoupled vehicle dynamics extension to visualinertial state estimator. In 2019 IEEE Intelligent Transportation Systems Conference (ITSC), pages 3593–3600. IEEE, 2019.
 [8] Stefan Leutenegger, Simon Lynen, Michael Bosse, Roland Siegwart, and Paul Furgale. Keyframebased visual–inertial odometry using nonlinear optimization. The International Journal of Robotics Research, 34(3):314–334, 2015.
 [9] Dongxuan Li, Kevin Eckenhoff, Kanzhi Wu, Yue Wang, Rong Xiong, and Guoquan Huang. Gyroaided cameraodometer online calibration and localization. In 2017 American Control Conference (ACC), pages 3579–3586. IEEE, 2017.
 [10] Mingyang Li and Anastasios I Mourikis. Optimizationbased estimator design for visionaided inertial navigation. In 2013 Robotics: Science and Systems, pages 241–248, 2013.
 [11] Jinxu Liu, Wei Gao, and Zhanyi Hu. Visualinertial odometry tightly coupled with wheel encoder adopting robust initialization and online extrinsic calibration. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5391–5397. IEEE, 2019.

[12]
Fangwu Ma, Jinzhu Shi, Yu Yang, Jinhang Li, and Kai Dai.
Ackmsckf: Tightlycoupled ackermann multistate constraint kalman filter for autonomous vehicle localization.
Sensors, 19(21):4816, 2019.  [13] Anastasios I Mourikis and Stergios I Roumeliotis. A multistate constraint kalman filter for visionaided inertial navigation. In 2007 IEEE International Conference on Robotics and Automation, pages 3565–3572. IEEE, 2007.
 [14] Tong Qin, Peiliang Li, and Shaojie Shen. Vinsmono: A robust and versatile monocular visualinertial state estimator. IEEE Transactions on Robotics, 34(4):1004–1020, 2018.
 [15] Tong Qin, Jie Pan, Shaozu Cao, and Shaojie Shen. A general optimizationbased framework for local odometry estimation with multiple sensors. arXiv preprint arXiv:1901.03638, 2019.
 [16] Meixiang Quan, Songhao Piao, Minglang Tan, and ShiSheng Huang. Tightlycoupled monocular visualodometric slam using wheels and a mems gyroscope. IEEE Access, 7:97374–97389, 2019.
 [17] Kejian J Wu, Chao X Guo, Georgios Georgiou, and Stergios I Roumeliotis. Vins on wheels. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pages 5155–5162. IEEE, 2017.
 [18] Yulin Yang, Patrick Geneva, Kevin Eckenhoff, and Guoquan Huang. Degenerate motion analysis for aided ins with online spatial and temporal sensor calibration. IEEE Robotics and Automation Letters, 4(2):2070–2077, 2019.
 [19] Wenlong Ye, Renjie Zheng, Fangqiang Zhang, Ziyou Ouyang, and Yong Liu. Robust and efficient vehicles motion estimation with lowcost multicamera and odometergyroscope. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4490–4496. IEEE, 2019.
 [20] Mingming Zhang, Yiming Chen, and Mingyang Li. Visionaided localization for ground robots. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2455–2461. IEEE, 2019.
 [21] Mingming Zhang, Xingxing Zuo, Yiming Chen, and Mingyang Li. Localization for ground robots: On manifold representation, integration, reparameterization, and optimization. arXiv preprint arXiv:1909.03423, 2019.
 [22] Xingxing Zuo, Mingming Zhang, Yiming Chen, Yong Liu, Guoquan Huang, and Mingyang Li. Visualinertial localization for skidsteering robots with kinematic constraints. arXiv preprint arXiv:1911.05787, 2019.