Bidirectional Trajectory Computation for Odometer-Aided Visual-Inertial SLAM

02/01/2020
by   Jinxu Liu, et al.
0

Odometer-aided visual-inertial SLAM systems typically have a good performance for navigation of wheeled platforms, while they usually suffer from degenerate cases before the first turning. In this paper, firstly we perform an observability analysis w.r.t. the extrinsic parameters before the first turning, which is a complement of the existing results of observability analyses. Secondly, inspired by the above observability analyses, we propose a bidirectional trajectory computation method, by which the poses before the first turning are refined in the backward computation thread, and the real-time trajectory is adjusted accordingly. Experimental results prove that our proposed method not only solves the problem of the unobservability of accelerometer bias and extrinsic parameters before the first turning, but also results in more accurate trajectories in comparison with the state-of-the-art approaches.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

03/03/2020

Closed-Loop Benchmarking of Stereo Visual-Inertial SLAM Systems: Understanding the Impact of Drift and Latency on Tracking Accuracy

Visual-inertial SLAM is essential for robot navigation in GPS-denied env...
02/26/2021

Trajectory Servoing: Image-Based Trajectory Tracking Using SLAM

This paper describes an image based visual servoing (IBVS) system for a ...
06/22/2021

HybVIO: Pushing the Limits of Real-time Visual-inertial Odometry

We present HybVIO, a novel hybrid approach for combining filtering-based...
08/28/2019

Fast and Robust Initialization for Visual-Inertial SLAM

Visual-inertial SLAM (VI-SLAM) requires a good initial estimation of the...
05/12/2018

Observability Analysis of Aided INS with Heterogeneous Features of Points, Lines and Planes

In this paper, we perform a thorough observability analysis for lineariz...
06/02/2017

WiFi based trajectory alignment, calibration and easy site survey using smart phones and foot-mounted IMUs

Foot-mounted inertial positioning (FMIP) can face problems of inertial d...
04/02/2022

UrbanFly: Uncertainty-Aware Planning for Navigation Amongst High-Rises with Monocular Visual-Inertial SLAM Maps

We present UrbanFly: an uncertainty-aware real-time planning framework f...

I Introduction

Visual-inertial SLAM (VI-SLAM) and visual-inertial odometry (VIO) approaches have received great attention from the researchers in recent years. They can be applied on mobile devices, micro aerial vehicles (MAVs), ground robots and passenger cars for localization and perception. Filtering-based methods such as MSCKF [13] and optimization-based methods such as OKVIS [8]

make up the two categories of state estimation methods. In general, filter-based approaches have better efficiency while optimization-based methods enjoy higher accuracy

[5]. The optimization-based approaches [8],[14] typically optimize a limited number of current states to limit the amount of computation, and use marginalization to make use of previous information to better estimate the current states.

For wheeled platforms such as robots and passenger cars, the accuracy of visual-inertial navigation can be dramatically improved with the aid of wheel encoders [17], [11], [20], [21], [7], [2], [12], [16]. In some methods such as [17], [9], [20],[21],[7], [2], [12] and [22], the IMU measurements and wheel encoder readings are pre-integrated individually. This type of approaches either need at least two wheel encoders or require the front wheel angle measurement for pre-integration, and have to deal with the problem that the angular velocity provided by the sensors on wheels is always within the ground plane. Other methods such as [16], [19] and [11] jointly pre-integrates the angular velocity from IMU and the linear velocity from wheel encoder. This type of approaches can still work when only one wheel encoder is available, and the uneven terrain does not have an impact on the performance of these approaches theoretically. In this paper we use wheel encoder and odometer to denote the same thing.

However, for applications on ground vehicles, VI-SLAM and VIO approaches often suffer from degenerate cases, even with the aid of wheel encoders. [17] points out two degenerate cases under special motions. Firstly, the scale is unobservable when the platform moves with constant local linear acceleration. Secondly, the roll and pitch angles are unobservable when the platform has no rotational motion. Both the cases are related to the accelerometer bias which can not be correctly estimated under the above special motions. [17] also proves that the first degenerated case can be eliminated with the use of wheel encoder. However, it can be drawn that the second degenerate case still exists in such a case through derivation, which is straightforward and will be presented briefly in the Appendix of this paper. Besides, the experimental results in [11] also indicate that the accelerometer bias can not be correctly estimated until the first turning with the use of wheel encoder. In addition to the accelerometer bias, some of the extrinsic parameters can not be correctly estimated as well, when the platform has no rotational motion. [18] have proved that the translational component of camera-IMU extrinsic parameters is unobservable in a VIO system when the platform undergoes pure translation. In Section III of this paper, we will give an observability analysis for the seven unobservable directions caused by the special motion pattern in extrinsic parameters for an odometer-aided VI-SLAM system, under the circumstance that the platform undergoes pure translation along a straight line. Thanks to the employment of marginalization, the optimization-based VI-SLAM approaches can make use of previous information collected since beginning. Therefore, once the platform performs rotational motion such as making a turn, the accelerometer bias and extrinsic parameters will be correctly estimated from then on. Nevertheless, before the first turning, the inaccuracy which results from the incorrectly estimated accelerometer bias and extrinsic parameters remains a problem for odometer-aided VI-SLAM approaches.

To relieve the inaccuracy before the first turning, [11] proposes to keep the extrinsic parameters constant during nonlinear optimization until the platform has made a turn and the estimation of accelerometer bias has reached convergence. Furthermore, we may further keep the accelerometer bias constant as zero before the first turning to relieve the inaccuracy caused by the incorrectly estimated accelerometer bias. However, since the accelerometer bias is actually not zero and the extrinsic parameters may not be very accurate, the accuracy of trajectory before the first turning may still be deteriorated, especially for outdoor scenes where the vehicle is likely to travel a long distance before the first turning.

By contrast, in this paper we propose a bidirectional trajectory computation approach to make the estimated poses before the first turning as accurate as those after the first turning. In short, after the first turning, we additionally create a backward computation thread to recalculate the poses from the first turning back to the starting point. In this way, the accuracy of estimated poses before the first turning do not suffer from the lack of rotation anymore, because the backward computation makes use of the information obtained in the first turning. Also note that by means of bidirectional trajectory computation, we can obtain more accurate overall trajectory in real time, because every time one of the poses before the first turning is updated by the backward computation thread, the real-time trajectory is also adjusted accordingly. In the following, we provide the observability analysis for the extrinsic parameters in Section III and describe the proposed bidirectional trajectory computation method in Section IV.

Ii Preliminaries on odometer-aided visual-inertial SLAM

The proposed bidirectional trajectory computation method is based on the odometer-aided VI-SLAM approach [11], which is a tightly-coupled approach based on sliding window optimization, where IMU and wheel encoder measurements are fused at the pre-integration stage.

Ii-a Frames and Notations

The coordinate frames of the sensors include the camera frame, the IMU frame and the odometer frame. The wheel encoder is installed on one rear wheel that always points forward. For the details of these frames the reader may refer to [11]. We use to denote the world frame that is fixed since initialization, and , and to denote the camera frame, IMU frame, and odometer frame corresponding to the image. Let

denote the rotation matrix that takes a vector in frame

to frame , and is its quaternion form. is the coordinate of the origin point of frame in frame , and is the velocity of the origin point of frame measured in frame . And let and denote the accelerometer bias and gyroscope bias corresponding to image respectively. Moreover, we use

to denote the skew symmetric matrix corresponding to a vector.

Ii-B State Estimation

The parameters to be estimated can be written as

(1)

where is the inverse depth of one landmark in camera frame, and are the camera-IMU extrinsic parameters, while and are the IMU-odometer extrinsic parameters. is the number of landmarks and is the size of sliding window.

The cost function mainly comprises reprojection error terms, IMU-odometer error terms and the marginalization error term, which writes as

(2)

where means the reprojection residual of landmark on image , and is the uniform information matrix for all reprojection error terms. is the set of images on which landmark appears. and are the residual vector and covariance matrix of the IMU-odometer terms respectively, which are derived utilizing the IMU-odometer pre-integration results. is the marginalization error term. In practice we additionally add a very small term confining the roll angle in , which is always an unobservable angle because the velocity of the wheel always points forward in its own coordinate frame. This term is so small that it is neglected in the following demonstrations. The nonlinear optimization is performed using Dogleg method by Ceres Solver [1]. For the details of state estimation, the reader may refer to [11].

Iii Observability Analysis

In this section, we analyze the observability of extrinsic parameters when the platform moves along a straight line with no rotation, which is often the case for a car on a straight road before its first turning. For the observability analysis we need to consider the reprojection constraints

(3)

as well as the IMU-odometer constraints

(4)
(5)
(6)
(7)
(8)
(9)

where and are the observations of landmark on image and image respectively. is the inverse depth of landmark in the camera frame where its first observation happens. is the projection function, and is the inverse function of . is the size of sliding window, is the set of landmarks whose first observation happens on image , and is the set of images that can see landmark but are not the first image seeing . , , and are the nominal states coming from IMU-odometer pre-integration. is the time interval between image and image . And denotes the vector part of a quaternion. For more details of (3) to (9), the reader may refer to [11] and [14].

Among the extrinsic parameters and , is only involved in (7), which can be rewritten as

(10)

When the platform moves along a straight line with no rotation, , in which case is not involved in any of the constraints, so it is unobservable.

Fig. 1: Estimation error in roll angle of

, the eigenvalue ratio and the angular velocity around the Z axis (the vertical axis) at the first turning in each sequence of urban30, urban32, urban33 and urban34 from top to bottom. For each sequence the pinkish box indicates the first turning.

Similarly, and are only involved in (3), which can be rewritten as

(11)

When the platform moves along a straight line with no rotation, is unobservable because . Moreover, in such a case, for every image pair , , and points in the same direction, to which we refer as the driving direction. Hence the component in rotation corresponding to the rotation around the driving direction, which is also called the roll angle in the following, is unobservable as well.

In practice, among the seven unobservable directions (three in , three in and one in ) when the platform moves along a straight line with no rotation, the roll angle in

is the direction whose observability is most relevant to whether the platform has made a turn. To make this point, we first conduct eigenvalue decomposition on the approximated Hessian matrix used in optimization of state estimation, which is the Jacobian matrix’s transpose multiplied by the Jacobian matrix. Then we compute the ratio of the eigenvalue, which corresponds to the eigenvector where the element corresponding to the roll angle in

is the largest in absolute value among all the eigenvectors, to the largest eigenvalue. The larger the eigenvalue ratio is, the better the roll angle in can be estimated. Figure 1 displays the impact of the first turning on the estimation error in roll angle of , as well as the eigenvalue ratio. Practically the estimation error in roll angle means the roll angle of , with being , where is the current estimated value, and is the real value of , which is obtained offline. The experiments in this section are conducted using the state estimation method illustrated in Section II-B, and the accelerometer bias is held constant and the extrinsic parameters start to be adjusted since the beginning, because our focus is on the observability of extrinsic parameters. In each sequence in Figure 1, the system starts at a dozen of seconds before the first turning, and the eigenvalue ratio is computed once every 10 optimizations. It is clear from Figure 1 that after the first turning the error in roll angle of dramatically decreases and that the eigenvalue ratio dramatically increases, both of which indicate that the roll angle in can be estimated much better after the first turning.

Theoretically, state estimation is not concerned with the errors in the unobservable directions of the extrinsic parameters, if the platform moves along a perfectly straight line with no rotation. However, in the real world, the motion of the platform is not exactly along a straight line, hence the the errors in the unobservable directions of the extrinsic parameters do affect the accuracy of the trajectory. Table I shows the comparison of absolute trajectory error (ATE) for four sequences in KAIST Urban Data Set[6], either using the accurate extrinsic parameters calibrated offline or using the extrinsic parameters with added fixed error (5 degrees in the roll angle of ). In each of the four sequences the car moves along an approximately straight road, and the trajectories are computed using the state estimation method illustrated in Section II-B, with accelerometer bias and extrinsic parameters both held constant. From Table I we can infer that the inaccurate extrinsic parameters can affect the accuracy of trajectories before the first turning.

Sequence urban22 urban23 urban24 urban25
EPs accurate 8.8 11.1 15.0 8.0
EPs with error 14.2 13.5 16.2 9.7
  • Here EPs accurate means using the accurate extrinsic parameters calibrated offline, and EPs with error means using the extrinsic parameters with added fixed error. ATE, absolute trajectory error; EPs, extrinsic parameters

TABLE I: ATE (in meters) using different extrinsic parameters

Iv Method

Taking into consideration that for the odometer-aided VI-SLAM system described in Section II-B, the system is not stable and the extrinsic parameters can not be correctly estimated in the beginning, and that the accelerometer bias is unobservable until the platform makes a turn, we propose a robust method to acquire accurate real-time trajectory.

Iv-a Forward Computation Thread and Backward Computation Thread

In the very beginning, we propagate the poses and try to initialize our system. After the system is initialized as described in [11], the state estimation in sliding window is performed as in Section II-B in the main thread, which we call the forward computation thread. In the forward computation thread, before the first turning, the extrinsic parameters are held constant, and the accelerometer bias is set to zero and held constant, in order to make the system robust in the beginning. At this stage, we limit the magnitude of the marginalization term as described in Section IV-C. Once the platform has made a turn larger than (45 degrees in our experiments) within a time interval (20 seconds in our experiments), the accelerometer bias starts to be adjusted, and as soon as the estimation of accelerometer bias reaches convergence according to the criterion adopted in [11], the extrinsic parameters starts to be adjusted. Thanks to the fact that the marginalization term contains historical information, especially the information gathered during the first turning, the accelerometer bias and extrinsic parameters that are engaged in state estimation will soon reach their desired values. After a time interval (30 seconds in our experiments) since the extrinsic parameters begin to be adjusted, we create a new thread named backward computation thread, meanwhile the forward computation thread keeps running. Both the two threads are independent with each other. Figure 2 is the schematic diagram illustrating forward computation and backward computation.

Fig. 2: Schematic diagram about forward computation and backward computation. Forward computation starts from the beginning. After the first turning backward computation starts. Forward computation continues to operate until the end. Backward computation proceeds to the starting point to work out more accurate poses.

The backward computation thread also performs state estimation in a sliding window where the parameters are as (1) and the cost function writes as (2). When creating the backward computation thread, the values of the parameters in the backward computation thread except for landmark inverse depths are copied from the forward computation thread, and the IMU-odometer terms and the marginalization term in the backward computation thread are identical to their counterparts in the forward computation thread. For the backward computation thread, the reprojection errors still take the form of (3), while the first observation in (3) means the observation happening on the image with the latest timestamp, instead of the one with the earliest timestamp as in the forward computation thread. Meanwhile, the inverse depth of each landmark is shifted as

(12)

where and are the inverse depths of landmark after and before shifting respectively, , and are indexes of the earliest and latest image which can see the landmark in the sliding window respectively, and the meanings of the other symbols are the same as those in (3).

(a) sliding window in the forward computation thread
(b) sliding window in the backward computation thread
Fig. 3: Contrast between the sliding windows in forward and backward computation threads. In either of the two threads, when a new frame is inputted, the frame marked with red cross is discarded if it is not a keyframe. Otherwise, the frame in the dashed red box is marginalized.

Parameters can be estimated correctly in the backward computation thread since the backward computation thread starts, because of the information contained in the marginalization error term. Figure 3(a)-3(b) show the contrast between the sliding windows in forward and backward computation threads. In the following we illustrate how the backward computation thread operates according to Figure 3(b). Suppose that the first frame in data sequence is

, and that at a certain moment

, there are frames in the sliding window, namely . In the backward computation thread, every time the nonlinear optimization in the sliding window finishes at the certain moment , the next frame to be inputted is , which is the one previous to the frame whose timestamp is the earliest in the sliding window. The IMU-odometer pre-integration between the above two frames ( and ) is computed, and the initial value of the pose and velocity of the frame to be newly inputted () is propagated using IMU measurements between the two frames. Note that although the IMU-odometer pre-integration between the above two frames has been performed in the past in the forward computation thread, recomputation is needed because the estimated value of IMU biases and the extrinsic parameter have changed, and they are engaged in pre-integration. Next, if the frame with the second earliest timestamp () is not a keyframe, it is discarded. Otherwise, the frame with the latest timestamp () is marginalized. The criterion to judge whether an image frame is a keyframe is the same as that in [14]. The backward computation terminates when the first frame in data sequence has been inputted into the sliding window. The IMU and wheel encoder measurements and the feature points used in backward computation are recorded previously during the forward computation.

When the pose of a certain frame is estimated in the backward computation thread, it is used to substitute the corresponding pose previously estimated in the forward computation thread, because the poses estimated in the backward computation thread are more accurate.

Iv-B Computation of Real-time Trajectory

Every time the backward computation thread updates the pose of a certain frame, we compute a continuous real-time trajectory. Let denote the pose for some frame in the real-time trajectory, and let denote the frame that has just been updated by the backward computation thread. For the frames before frame , i.e. , the pose is just what was computed in Section IV-A, and for the frames after frame , i.e. , the pose is computed as

(13)

where and are the poses for frame before and after being updated by the backward computation thread respectively, and is the pose of frame computed in Section IV-A.

Iv-C Bounded Marginalization Term

The marginalization residual takes the form of . and are computed in the marginalization process, and is the step to update the parameters . We have observed the phenomenon that the marginalization error keeps growing and thus the total error keeps growing before the first turning, when the accelerometer bias and extrinsic parameters are held constant. In order to reduce the accumulation of the error caused by inaccurate extrinsic parameters and accelerometer bias in the marginalization error term and prevent the above error from dominating the state estimation, before the first turning we multiply and by a ratio once the ratio of marginalization error to total error in (2) after optimization rises beyond a threshold . In our experiment is set to 0.85 and is set to 0.4.

V Experiments

We evaluate the effect of the bidirectional trajectory computation method proposed in this paper on KAIST Urban Data Set [6], which is a publicly available dataset containing data in complex urban scenes collected on a rear wheel drive passenger car. The sensors in the dataset include stereo cameras, one IMU and two wheel encoders mounted on two rear wheels. The frequencies of the captured images, IMU measurements and wheel encoder measurements are 10Hz, 100Hz and 100Hz respectively. The proposed approach is compared with the stereo inertial version of the state-of-the-art VI-SLAM system VINS-Fusion[15], the standard VIO [10], and the odometer-aided VI-SLAM approaches [20], [21] and [11]. The proposed approach and [11] use a monocular camera, one IMU and one wheel encoder. The approaches [20] and [21], as reported in their papers, use a monocular camera, one IMU and two wheel encoders. The stereo inertial version of VINS-Fusion uses stereo cameras and one IMU. All the experiments presented are performed on a PC with Intel Core i7 3.6GHz 6 core CPU and 64GB memory. The extrinsic parameters provided in the dataset are adopted as initial values, which may not be very accurate.

V-a Average Positioning Error by Aligning the Starting Frame

The primary concern of our bidirectional trajectory computation method is to improve the accuracy at initial stage, which matters a lot supposing we only know the position and orientation of the vehicle at the starting point. In our first evaluation, we align the position and orientation of the starting image frames for the resulting trajectory from VI-SLAM approaches and the ground truth trajectory, and compute the average positioning error of every frame in the data sequence. The practice of aligning the starting frames is also adopted in the evaluation criteria on KITTI dataset [3]. Here our proposed approach is mainly compared with [11], which our approach is based on. The work [11] starts to optimize accelerometer bias from the beginning, and fix the extrinsic parameters until the platform has made a turn and the estimation of accelerometer bias has reached convergence, in order to reduce the instability in the very beginning. In order to make an exhaustive comparison on different strategies dealing with accelerometer bias and extrinsic parameters that are two instability factors, we derive some adapted versions from [11], that are: (i) both accelerometer bias and extrinsic parameters are fixed until the first turning (FAFE), (ii) the extrinsic parameters starts to be optimized from the beginning, and accelerometer bias is fixed until the first turning (FAOE), (iii) both accelerometer bias and extrinsic parameters starts to be optimized from the beginning (OAOE). Our proposed approach is firstly compared against [11] (OAFE) and its three adapted versions in the above, as well as VINS-Fusion[15]. To make a fair comparison, we select the image frame when the vehicle has traveled 100 meters as the starting frame, to avoid being affected by some erroneous pose estimations from some approaches in the very beginning. This comparison is made on all the 15 sequences with stereo cameras and with complexity level 3 (middle) or level 4 (high) in [6], namely urban25-urban39. The comparison of average positioning error by aligning the starting frame is shown in Table II.

Sequence Proposed FAFE FAOE OAOE OAFE[11] VINS-Fusion[15] Trajectory length
urban25* 11.6 11.3 72.7 62.1 15.3 862.7 2.5km
urban26 26.6 41.0 42.8 29.8 41.7 52.9 4.0km
urban27 11.7 49.5 73.4 91.9 44.5 63.1 5.4km
urban28 35.2 61.5 47.7 27.7 104.8 103.4 11.5km
urban29 50.4 44.3 12.1 13.3 40.1 122.6 3.6km
urban30 29.8 34.6 36.8 45.5 43.0 6.0km
urban31 712.7 1107.6 995.9 1072.0 1229.5 1738.9 11.4km
urban32 39.2 407.5 140.6 149.1 422.6 257.9 7.1km
urban33 37.4 177.3 81.9 130.9 221.3 696.7 7.6km
urban34 64.5 98.8 160.9 122.6 168.7 7.8km
urban35* 58.6 49.3 290.9 253.4 57.5 3.2km
urban36* 345.9 333.1 221.5 281.0 283.7 9.0km
urban37* 421.6 371.0 1989.0 2220.8 677.0 1125.7 11.8km
urban38 33.3 123.4 151.9 44.8 101.3 134.6 11.4km
urban39 12.0 953.7 22.0 36.8 42.1 11.0km
  • Here Proposed means the proposed approach in this paper and ’’ means failure. The sequences marked with ’*’ do not contain turnings, so the difference in accuracies on those sequences between the proposed approach and FAFE is only resulted by restricting the marginalization error as described in Section IV. FAFE, fixing accelerometer bias and fixing extrinsic parameters; FAOE, fixing accelerometer bias and optimizing extrinsic parameters; OAOE, optimizing accelerometer bias and optimizing extrinsic parameters; OAFE, optimizing accelerometer bias and fixing extrinsic parameters

TABLE II: Comparison of average positioning error (in meters) by aligning the starting frame

Table II indicates that the proposed approach outperforms all the other five approaches on 9 out of the 15 sequences. Among the rest six sequences, urban25, urban35, urban36 and urban37 do not contain turnings, as a result the proposed bidirectional trajectory computation does not come in handy on these sequences in our proposed approach. The accuracy of the proposed approach on the above four sequences is generally higher than FAOE, OAOE and the stereo VI-SLAM[15], and comparable with OAFE[11], but slighterly lower than FAFE. That is because the manipulation described in Section IV-C can cause information loss, given that when the trajectory does not contain turnings, the only difference between the proposed approach and FAFE lies in the utilization of the manipulation in Section IV-C. However, in view of the good performance of the proposed approach on the other sequences with turnings where the bidirectional trajectory computation comes in handy, the benefit of the manipulation in Section IV-C dramatically outweighs the cost.

Generally speaking, the accuracy of the proposed approach is higher than [11], as well as its adapted versions that deal with the accelerometer bias and extrinsic parameters differently.

V-B Absolute Trajectory Error (ATE) Comparison

We also make an extensive comparison with more approaches, including the odometer-aided VI-SLAM approaches [20], [21] and [11], the stereo VI-SLAM system VINS-Fusion [15], and the standard VIO [10]. The comparison is made in terms of absolute trajectory error (ATE), which is the rooted mean square error (RMSE) of the positions after a 6-DoF trajectory alignment with the ground truth. The experiments are conducted on the sequences urban26, urban28, urban38 and urban39, because only the ATEs of these four sequences are reported in the paper [21]. The comparison results are shown in Table III.

Table III indicates that the proposed approach is clearly more accurate than other approaches in terms of ATE on those four sequences.

V-C Evaluation of Effects on Estimating Accelerometer Bias and Extrinsic Parameter

We examine the effects on estimating accelerometer bias and extrinsic parameter using the proposed bidirectional trajectory computation approach, in order to reveal why this approach improves the accuracy. The proposed approach is compared with the approach OAOE in Section V-A, which optimizes accelerometer bias and extrinsic parameters from the beginning and only performs forward computation. Figure 4 shows the comparison on estimated values of accelerometer bias and the estimation error in roll angle of between the above two approaches, at the first turning in each sequence of urban27, urban28, urban30 and urban34. As same as in Section III, here the system also starts at a dozen seconds before the first turning in each sequence instead of starting from the very beginning. Figure 4 indicates that the estimation error in roll angle of is much smaller using the proposed approach, and that the estimated value of accelerometer bias is more stable over time, which is more reasonable because the accelerometer bias is a slow time-varying quantity. Besides, both approaches can estimate the z-component of the accelerometer bias well. That is because both of the two unobservable directions of accelerometer bias before the first turning, which compose the orthogonal basis spanning over the 2D column space of , have very small values in their respective z-components, in consideration of the fact that is a rotation mainly around the Z axis for a ground vehicle and the gravity direction is exactly along the Z axis. The ground-truth value of is obtained offline, and the method to compute error in the roll angle is the same as that in Section III.

Fig. 4: Comparison on estimated accelerometer bias and the estimation error in roll angle of . acc_bias_x_uni, acc_bias_y_uni, acc_bias_z_uni and roll_angle_error_uni are respectively the three components of accelerometer bias and the roll angle error estimated by the unidirectional computation method OAOE. acc_bias_x_bi, acc_bias_y_bi, acc_bias_z_bi and roll_angle_error_bi are the corresponding quantities estimated by our proposed bidirectional trajectory computation method. acc, accelerometer; uni, unidirectional; bi, bidirectional

V-D Computation of Real-time Trajectory

To illustrate the effect of computing the real-time trajectory, we take the sequence urban32 for example. Figure 5(a)-5(d) displays the real-time trajectories after 0, 3, 6 and 9 minutes since backward computation starts respectively, compared with the ground truth trajectory. We can see that as backward computation proceeds, the real-time trajectory becomes closer and closer to the ground truth trajectory gradually. The performance on this sequence is shown in the supplementary video.

Sequence Proposed [11] [21] [20] VINS-Fusion[15] [10] Trajectory length
urban26 9.8 11.9 14.8 16.1 22.5 32.8 4.0km
urban28 19.8 27.8 25.0 33.1 93.3 34.7 11.5km
urban38 14.0 16.0 33.5 43.0 90.0 55.5 11.4km
urban39 7.2 8.0 21.3 24.0 33.4 11.0km
  • Here Proposed means the proposed approach in this paper and ’’ means failure. Results for [21], [20] and [10] are obtained from the reported results in [21]. ATE, absolute trajectory error

TABLE III: Comparison of ATE (in meters) among different approaches
(a) Trajectory after 0 minutes
(b) Trajectory after 3 minutes
(c) Trajectory after 6 minutes
(d) Trajectory after 9 minutes
Fig. 5: Real-time trajectories in urban32 at different moments.

Vi Conclusion

In this paper, we propose a bidirectional trajectory computation method for VI-SLAM aided with wheel encoder. Firstly, we perform an observability analysis on the degenerate case that an odometer-aided VI-SLAM system deployed on a car possibly encounters before the first turning. Secondly, we describe our proposed backward computation thread which refines the poses before the first turning, as well as the method to adjust the real-time trajectory. Experimental results show the higher accuracy of the whole trajectory, the correctly estimated parameters before the first turning, and the effects of real-time trajectory adjustment. Although in this paper wheel encoder is used, we also believe that the proposed bidirectional trajectory computation method can be applied on VI-SLAM systems that are not aided with wheel encoders as well.

Appendix

Hereafter we prove that the roll and pitch angles are unobservable when the platform has no rotational motion, even if the wheel encoders are used. For brevity, the proof is given by extending the derivations in [17]. According to [17], let denote the direction

(14)

where we only consider the 3D position of a single landmark as what was actually done in [17]. And for any block row, (see (24) in [17]), of the observability matrix as (39) in [4], the work [17] has already obtained if the platform has no rotational motion, i.e.

(15)

where we use to denote the IMU frame at any moment by slightly abusing the symbol. Therefore, under the circumstances, whether the roll and pitch angles are unobservable depends on whether , with being any of the extra block rows in the observability matrix provided by the wheel encoder measurements (see (38) in [17]). takes the form of

(16)

where , and are as (46), (104) and (112) in [4] respecitvely. (There is a sign typo happening in (104) in [4]. Translating into the symbol system of this paper, the correct form of should be .) When (15) is satisfied, , , and , with being the time interval between image and image . Hence

(17)

In such a case, considering and (17), we obtain

(18)

Hence the roll and pitch angles are still unobservable when the platform has no rotational motion, even if the wheel encoders are used.

References

  • [1] Sameer Agarwal, Keir Mierle, and Others. Ceres solver. http://ceres-solver.org, 2017.
  • [2] Zhiqiang Dang, Tianmiao Wang, and Fumin Pang. Tightly-coupled data fusion of vins and odometer based on wheel slip estimation. In 2018 IEEE International Conference on Robotics and Biomimetics (ROBIO), pages 1613–1619. IEEE, 2018.
  • [3] Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are we ready for autonomous driving? the kitti vision benchmark suite. In

    2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    , pages 3354–3361. IEEE, 2012.
  • [4] Joel A Hesch, Dimitrios G Kottas, Sean L Bowman, and Stergios I Roumeliotis. Consistency analysis and improvement of vision-aided inertial navigation. IEEE Transactions on Robotics, 30(1):158–176, 2014.
  • [5] G. Huang. Visual-inertial navigation: A concise review. In 2019 International Conference on Robotics and Automation (ICRA), pages 9572–9582. IEEE, May 2019.
  • [6] Jinyong Jeong, Younggun Cho, Young-Sik Shin, Hyunchul Roh, and Ayoung Kim. Complex urban dataset with multi-level sensors from highly diverse urban environments. The International Journal of Robotics Research, 38(6):642–657, 2019.
  • [7] Rong Kang, Lu Xiong, Mingyu Xu, Junqiao Zhao, and Peizhi Zhang. Vins-vehicle: A tightly-coupled vehicle dynamics extension to visual-inertial state estimator. In 2019 IEEE Intelligent Transportation Systems Conference (ITSC), pages 3593–3600. IEEE, 2019.
  • [8] Stefan Leutenegger, Simon Lynen, Michael Bosse, Roland Siegwart, and Paul Furgale. Keyframe-based visual–inertial odometry using nonlinear optimization. The International Journal of Robotics Research, 34(3):314–334, 2015.
  • [9] Dongxuan Li, Kevin Eckenhoff, Kanzhi Wu, Yue Wang, Rong Xiong, and Guoquan Huang. Gyro-aided camera-odometer online calibration and localization. In 2017 American Control Conference (ACC), pages 3579–3586. IEEE, 2017.
  • [10] Mingyang Li and Anastasios I Mourikis. Optimization-based estimator design for vision-aided inertial navigation. In 2013 Robotics: Science and Systems, pages 241–248, 2013.
  • [11] Jinxu Liu, Wei Gao, and Zhanyi Hu. Visual-inertial odometry tightly coupled with wheel encoder adopting robust initialization and online extrinsic calibration. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5391–5397. IEEE, 2019.
  • [12] Fangwu Ma, Jinzhu Shi, Yu Yang, Jinhang Li, and Kai Dai.

    Ack-msckf: Tightly-coupled ackermann multi-state constraint kalman filter for autonomous vehicle localization.

    Sensors, 19(21):4816, 2019.
  • [13] Anastasios I Mourikis and Stergios I Roumeliotis. A multi-state constraint kalman filter for vision-aided inertial navigation. In 2007 IEEE International Conference on Robotics and Automation, pages 3565–3572. IEEE, 2007.
  • [14] Tong Qin, Peiliang Li, and Shaojie Shen. Vins-mono: A robust and versatile monocular visual-inertial state estimator. IEEE Transactions on Robotics, 34(4):1004–1020, 2018.
  • [15] Tong Qin, Jie Pan, Shaozu Cao, and Shaojie Shen. A general optimization-based framework for local odometry estimation with multiple sensors. arXiv preprint arXiv:1901.03638, 2019.
  • [16] Meixiang Quan, Songhao Piao, Minglang Tan, and Shi-Sheng Huang. Tightly-coupled monocular visual-odometric slam using wheels and a mems gyroscope. IEEE Access, 7:97374–97389, 2019.
  • [17] Kejian J Wu, Chao X Guo, Georgios Georgiou, and Stergios I Roumeliotis. Vins on wheels. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pages 5155–5162. IEEE, 2017.
  • [18] Yulin Yang, Patrick Geneva, Kevin Eckenhoff, and Guoquan Huang. Degenerate motion analysis for aided ins with online spatial and temporal sensor calibration. IEEE Robotics and Automation Letters, 4(2):2070–2077, 2019.
  • [19] Wenlong Ye, Renjie Zheng, Fangqiang Zhang, Ziyou Ouyang, and Yong Liu. Robust and efficient vehicles motion estimation with low-cost multi-camera and odometer-gyroscope. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4490–4496. IEEE, 2019.
  • [20] Mingming Zhang, Yiming Chen, and Mingyang Li. Vision-aided localization for ground robots. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2455–2461. IEEE, 2019.
  • [21] Mingming Zhang, Xingxing Zuo, Yiming Chen, and Mingyang Li. Localization for ground robots: On manifold representation, integration, re-parameterization, and optimization. arXiv preprint arXiv:1909.03423, 2019.
  • [22] Xingxing Zuo, Mingming Zhang, Yiming Chen, Yong Liu, Guoquan Huang, and Mingyang Li. Visual-inertial localization for skid-steering robots with kinematic constraints. arXiv preprint arXiv:1911.05787, 2019.