I Introduction
In this work, we present a sensor selfcalibration method for visualinertial egomotion estimation frameworks i.e. systems that fuse visual information from one or multiple cameras with an inertial measurement unit (IMU) to track the pose (position and orientation) of the sensors over time. Over the last years, visualinertial tracking has become an increasingly popular method and is being deployed into a big variety of products including AR/VR headsets, mobile devices, and robotic platforms. Largescale projects, such as Microsoft’s HoloLens, make these complex systems available as part of massconsumer devices operated by nonexperts over the entire lifespan of the product. This transition from the traditional lab environment to the consumer market poses new technical challenges to keep the calibration of the sensors uptodate.
Traditionally, visualinertial sensors are calibrated in a laborious manual process by an expert often using specialized tools and external markers such as checkerboard patterns (e.g. [1]). Aside from a lack of equipment, the lack of knowledge on how to properly excite all modes usually renders these methods infeasible for consumers as specific motion is required to obtain a consistent calibration. However, it can be used at the factory to provide an initial calibration for the device. Due to varying conditions (e.g. temperature, shocks, etc.) such calibrations degrade over time and periodic recalibrations become necessary. A straightforward approach to this problem would be to run a calibration over a long dataset, hoping it is rich enough to excite all modes of the system. Yet, the large computational requirement of such a batch method might render this approach infeasible on constrained platforms without careful data selection.
This work exploits that information is usually not distributed uniformly along the trajectory of most visualinertial datasets, as illustrated in Fig. 1 for a mountainbike dataset. trajectory segments with higher excitation provide more information for sensor calibration whereas segments with weak excitation can lead to a nonconsistent or even wrong calibration. Consequently, we propose a calibration architecture that evaluates the information content of trajectory segments in a background process alongside an existing visualinertial estimation framework. A database maintains the most informative segments that have been observed either in a singlesession or over multiple sessions to accumulate relevant calibration data over time. Subsequently, the collected segments are used to update the calibration parameters using a segmentbased calibration formulation.
By only including the most informative portion of the trajectory, we are able to reduce the size of the calibration dataset considerably. Further, we can collect exciting motion in a background process assuming such motion occurs eventually and thus take the burden from the users to perform them consciously (which might be hard for nonexperts). With this approach we can automate the traditional tedious calibration task and perform a recalibration without any user intervention e.g. while playing an AR/VR video game or while navigating a car through the city. Additionally, our method facilitates the use of more advanced sensor models (e.g. IMU intrinsics) with potentially weakly observable modes that require specific motion for a consistent calibration.
This article is an extension of our previous work [2] where we presented the following:

an efficient informationtheoretic metric to identify informative segments for calibration,

a segmentbased selfcalibration method for the intrinsic and extrinsic parameters of a visualinertial system, and

evaluations of the calibration parameter repeatability showing comparable performance to a batch approach.
In this work, we extend with the following contributions:

a comprehensive review of the stateoftheart on visual and inertial sensor calibration,

a study of three different metrics for the selection of informative segments,

an evaluation of the motion estimation accuracy on motioncapture groundtruth, and

a comparison against an extended Kalman filter (EKF) approach that jointly estimates motion and calibration parameters.
Ii Literature Review
Over the past two decades, visualinertial state estimation has been studied extensively by the research community and many methods and frameworks have been presented. For example, the work of Leutenegger et al. [3] fuses the information of both sensor modalities in a fixedlagsmoother estimation framework and demonstrates metric pose tracking with an accuracy in the subpercent range of distance traveled. Many applications on resourceconstrained platforms, such as mobile phones, however, use filteringbased approaches which offer pose tracking with similar accuracy at a lower computational cost. An early method of this form is the one from Mourikis and Roumeliotis [4], and more recently also from Bloesch et al. [5], that directly minimizes a photometric error on image patches instead of a geometric reprojection error on pointfeatures. Newer frameworks e.g. from Qin et al. [6] or Schneider et al. [7] also incorporate online localization/loopclosures to further reduce the drift or in certain cases even eliminate it completely.
All these methods require an accurate and uptodate calibration of all sensor models to achieve good estimation performance. For this reason, a multitude of methods have been developed to calibrate models for the camera, IMU and relative pose between the two sensors. An overview of early methods that calibrate each model independently can be found in [8, 9, 10]. In the remaining of this section we, first, provide an overview of the state of the art in selfcalibration of visualinertial sensor systems and, second, discuss the most relevant observabilityaware calibration approaches. And finally, we review methods that perform informationtheoretic data selection for calibration purposes; which are most related to our approach.
Iia Markerbased Calibration
The work on selfcalibration of visual and inertial sensors is still limited and therefore, we first discuss approaches that rely on external markers such as checkerboard patterns. An approach based on an EKF is presented in [11] that uses a checkerboard pattern as a reference to jointly estimate the relative pose between an IMU and a camera with the pose, velocity, and biases. Zachariah and Jansson [12] additionally estimate the scale error and misalignment of the inertial axis using a sigmapoint Kalman filter.
A parametric method is proposed in [13] describing a batch estimator in continuoustime that represents the pose and bias trajectories using Bsplines. Krebs [14] extends this work by compensating additional sensing errors in the IMU model; namely measurement scale, axis misalignment, crossaxis sensitivity, the effect of linear accelerations on gyroscope measurements and the orientation between the gyroscope and the accelerometer. A similar model is calibrated by Nikolic et al. [15] where they make use of a nonparametric batch formulation and thus avoid the selection of a basis function for the pose and bias trajectories which might depend on the dynamics of the motion (e.g. over the knot density). The nonparametric and parametric formulation are compared in realworld experiments with the conclusion that the accuracy and precision of both methods are similar [15].
IiB Markerless Calibration
In contrast to targetbased, selfcalibration methods solely rely on natural features to calibrate the sensor models without the need for external markers such as checkerboards. Early work of this from was presented by Kelly and Sukhatme [16] and uses an unscented Kalman filter to jointly estimate pose, bias, velocity, IMUtocamera relative pose and also the local scene structure. Their realworld experiments demonstrate that the relative pose between a camera and an IMU can be accurately estimated with similar quality to targetbased methods. The work of PatronPerez et al. [17] additionally calibrates the camera intrinsics and uses a continuoustime formulation with a Bsplines parameterization. Li et al. [18] go one step further and also include the following calibration parameters into the (nonparametric) EKFbased estimator: time offset between camera and IMU, scale errors and axis misalignment of all inertial axis, linear acceleration effect on the gyroscope measurements (gsensitivity), camera intrinsics including lens distortion and the rollingshutter linedelay. A simulation study and realworld experiments indicate that all these quantities can indeed be estimated online solelybased on natural features [18].
IiC Observability of Model Parameters
All of the discussed calibration methods so far, both targetbased and selfcalibration methods, rely on sufficient excitation of all sensor models to yield an accurate calibration. Mirzaei and Roumeliotis [11] formally prove that the IMUtocamera extrinsics are observable in a targetbased calibration setting where the observability only depends on sufficient rotational motion. The analysis of Kelly and Sukhatme [16] shows that the IMUtocamera extrinsics remains observable also for a selfcalibration formulation. Further, Li and Mourikis [19] derive the necessary condition for the identifiability of a constant time offset between the IMU and camera measurements.
So far, no observability analysis has been performed for the full joint selfcalibration problem that includes the intrinsics of the IMU and camera and also the relative pose between the two sensors. Our experience, however, indicates that ‘rich’ exciting motion is required to render all parameters observable and usually such calibration datasets are collected by expert intuition. Often, this knowledge is missing when simultaneous localization and mapping (SLAM) systems are deployed to consumermarket products. For this reason, the (re)calibration dataset collection process must be automated for true lifelong autonomy.
IiD Active Observabilityaware Calibration
Active calibration methods automate the dataset collection by planning and executing trajectories which ensure the observability of the calibration parameters wrt. a specified metric. An early work in this direction for targetbased camera calibration is [20]. They present an interactive method that suggests the next view of the target that should be captured such that the quality of the model improves incrementally.
Another active calibration method is presented by Bähnemann et al. [21] to plan informative trajectories using a samplingbased planner to calibrate Micro Aerial Vehicle (MAV) models. The informativeness of a candidate trajectory segment within the planner is approximated by the determinant of the covariance of the calibration parameters which is propagated using an EKF. In a similar setting, Hausman et al. [22] plan informative trajectories to calibrate the model of an Unmanned Aerial Vehicle (UAV) using the local observability Gramian as an information measure. An extension to this work is presented by Preiss et al. [23] where they additionally consider freespace information and dynamic constraints of the vehicle within the planner. The condition number of the Expanded Empirical Local Observability Gramian (E^{2}LOG) is proposed as an information metric. The columns of E^{2}
LOG are scaled using empirical data to balance the contribution of multiple states. A simulation study shows that the method outperforms random motion and also the wellknown heuristic, such as the figure8 or star motion pattern. Further, the study indicates that trajectories minimizing the E
^{2}LOG perform slightly better compared to the minimization of the trace of the covariance matrix but in general yield comparable performance.IiE Passive Observabilityaware Calibration – Calibration on Informative Segments
In contrast to the class of active calibration methods, passive methods cannot influence the motion and instead identify and collect informative trajectory segments to build a complete calibration dataset over time. The framework of Maye et al. [24] selects a set of the most informative segments using an information gain measure to consequently perform a calibration on the selected data. A truncatedQR solver is used to limit updates to the observable subspace. The generality of this method makes it suitable for a wide range of problems. Unfortunately, the expensive information metric and optimization algorithm prevent its use on resourceconstrained platforms. Similarly, Keivan and Sibley [25] maintain a database of the most informative images to calibrate the intrinsic parameters of a camera but use a more efficient entropybased information metric for the selection. Nobre et al. [26] extend the same framework to calibrate multiple sensors and more recently Nobre et al. [27] also include the relative pose between an IMU and a camera.
In our work, we take a similar approach to [24, 25] but also consider inertial measurements and consequently collect informative segments instead of images. In contrast to the general method of [24], we use an approximation for the visualinertial usecase and neglect any crossterms between segments when evaluating their information content. This approximation increases the efficiency at the cost that no loopclosure constraints can be considered. Compared to [27], we assume the calibration parameters to be constant over a single session but additionally calibrate the intrinsic parameters of the IMU using a model similar to [14, 28].
Iii Visual and Inertial System
The visualinertial sensor system considered in this work consists of a globalshutter camera and an IMU. For better readability, the formulation is presented only for a single camera, however, the method has been tested for multiple cameras as well. All sensors are assumed to be rigidly attached to the sensor system. The IMU itself consists of a 3axis accelerometer and a 3axis gyroscope. In this work, we assume an accurate temporal synchronization of the IMU
and camera measurements and exclude the estimation of the clock offset and skew. However, online estimation of these clock parameters is feasible as shown in
[19].The following subsections introduce the sensor models for the camera and IMU. An overview of all model parameters is shown in Table I and all relevant coordinate frames of the visual and inertial system in Fig. 2.
Parameter  Symbol  Dim.  Unit 
Camera  
focal length  px  
principal point  px  
distortion    
IMU  
axis misalignment (gyro, accel.)  ,  ,   
axis scale (gyro, accel.)  ,  ,   
rotation w.r.t.    
Extrinsics  
translation w.r.t.  m  
rotation w.r.t.   
Iiia Notation and Definitions
A transformation matrix
takes a vector
expressed in the frame of reference into the coordinates of the frame and can be further partitioned into a rotation matrix and a translation vector as follows:(1) 
The unit quaternion represents the rotation corresponding to as defined in [29]. The operator is defined to transform a vector in from to the frame of reference as according to Eq. (1).
IiiB Camera Model
A function models the perspective projection and lens distortion effects of the camera. It maps the th 3d landmark onto the image plane of the camera to yield the 2d image point as:
(2) 
where denotes the model parameters of the perspective projection function (which we want to calibrate).
In our evaluation setup, we use highfieldofview cameras as they typically yield more accurate motion estimates [30]. As a consequence the camera records a heavily distorted image of the world. To account for these effects, we augment the pinhole camera model with the fieldofview (FOV) distortion model [31] to obtain the following perspective projection function:
(3) 
where denotes the focal length, the principal point and the 2d projection of a 3d landmark in normalized image coordinates as:
(4) 
The function models the (symmetric) distortion effects as a function of the radial distance to the optical center as:
(5) 
with being the single parameter of the FOV distortion model.
The measurement model for landmark observations expressed in the global frame (see Fig. 2) can be written as:
(6)  
where denotes the projection of the landmark onto the image plane of the keyframe , the pose of the sensor system, the relative pose of the camera w.r.t. the IMU and
a white Gaussian noise process with zero mean and standard deviation
as . The full calibration state of the camera model can be summarized as:where the cameraIMU relative pose is split into its rotation part and its translation part , is the focal length, the principal point and the distortion parameter of the lens distortion model.
IiiC Inertial Model
The IMU considered in this work consists of a (lowcost) MEMS 3axis accelerometer and a 3axis gyroscope. As in the work of [28, 14, 15], we include the alignment of the nonorthogonal sensing axis and a correction of the measurement scale into our sensor model. Further, we assume the translation between the accelerometer and gyroscope to be small (singlechip IMU) and only model a rotation between the two sensors (as shown in Fig. 2).
Considering these effects, we can write the model for the gyroscope measurements as:
(7) 
where denotes the true angular velocity of the system, a correction matrix accounting for the scale and misalignment of the individual sensor axis (see Eq. (15)), is a random walk process as:
(8) 
with the zeromean white noise Gaussian processes being defined as
(9)  
(10) 
Similarly, the specific force measurements of the accelerometer are modeled as:
(11) 
where is the true acceleration of the sensor system w.r.t. to the inertial frame , the relative orientation between the gyroscope and accelerometer frame, the orientation of the IMU w.r.t. the inertial frame , is a correction matrix for the scale and misalignment (see Eq. (15)), the gravity acceleration expressed in the inertial frame . The bias process is defined as a random walk process as:
(12) 
with the zeromean white noise Gaussian processes being defined as:
(13)  
(14) 
The noise characteristics of the IMU are assumed to have been identified beforehand at nominal operating conditions e.g. using the method described in [32]. The correction matrix and accounting for the scale and misalignment errors is defined identically for the gyroscope and accelerometer and is partitioned as:
(15) 
where denotes the collection of all misalignment and all scale factors as:
(16) 
The full calibration state of the inertial model can then be summarized as:
(17) 
where describes the rotation of gyroscope frame w.r.t. to the accelerometer frame (with the IMU frame being defined as the gyroscope frame ).
Iv VisualInertial SelfCalibration
In this section, we formulate the selfcalibration problem for visual and inertial sensor systems using the sensor models introduced in the previous section. The derived maximumlikelihood (ML) estimator makes use of all images and inertial measurements within the dataset to yield a fullbatch solution. The motion of the sensor system and the (sparse) scene structure are jointly estimated with the model parameters to achieve selfcalibration without the need for a known calibration target (e.g. a chessboard pattern). The batch estimator will serve as a base to introduce the segmentbased calibration which only considers the most informative segments of a trajectory (see Section V).
Iva System State and Measurements
The selfcalibration formulation jointly estimates all keyframe states , all point landmarks , the calibration parameters of the camera and the IMU with the keyframe state being defined as:
(18) 
where and define the pose of the sensor system at timestep , the velocity of the system and the bias of the gyroscope and accelerometer.
To simplify further notations, we collect all states of the problem in the following vectors:
(19) 
where is the total number of keyframes and the number of landmarks. Additionally, the vector stacks all estimated states as:
(20) 
Further, we define the collection to contain all IMU measurements and all 2d landmark observations of the camera as:
(21)  
where is the set of all accelerometer and gyroscope measurements between the keyframes and and the 2d measurement of the th landmark seen from the th keyframe and and denote the number of keyframes and landmarks respectively.
IvB State Initialization using VIO
A vision frontend tracks sparse point features between consecutive images and rejects potential outliers based on geometrical consistency using a perspectivenpoint algorithm in a RANSAC scheme. The resulting feature tracks and the
IMU measurements are processed by an EKF which is loosely based on the formulation of [4, 18] but with various extension to increase robustness and accuracy. The filter recursively estimates all keyframe states and landmark positions . The calibration states are not estimated by this filter except for the cameratoIMU relative pose (camera extrinsics). However, for the initialization of the calibration problem, we only use the keyframe states (pose, velocity, biases) and the most recent estimate of the cameratoIMU extrinsics. The landmark states are initialized by triangulation using the poses estimated by the EKF filter.It is important to note that the filter needs sufficiently good calibration parameters in order to run properly and provide accurate initial estimates. In our experience, it is sufficient for most singlechip IMUs to initialize their intrinsic calibration to a nominal value (unit scale, no misalignment). However, a complete selfcalibration may be difficult if no priors are available for the camera intrinsics. In this case, a specialized calibration method should be used beforehand e.g. [1, 33].
IvC MLbased SelfCalibration Problem
We use the framework of ML estimation to jointly infer the state of all keyframes , landmarks and calibration parameters using all available measurements of the IMU and the 2d measurements of the point landmarks extracted from the camera images.
A factor graph representation of the visualinertial selfcalibration formulation is shown in Fig. 3. The problem contains two types of factor: the visual factor models the projection of the landmark onto the image plane of the keyframe and the inertial factor forms a differential constraint between two consecutive keyframe states and (pose, velocity, bias). The ML estimate is obtained by a maximization of the corresponding likelihood function . When assuming Gaussian noise for all sensor models (see Section III), the ML solution can be approximated by solving the (nonlinear) leastsquares problem with the following objective function :
(22)  
where denotes the number of keyframes, the set of landmarks off from keyframe , the reprojection error of the th point landmark of observed from the th keyframe and denotes the inertial constraint error between two consecutive keyframe states and as a function of integrated IMU measurements. The terms and denote the inverse of the error covariance matrices: keypoint measurement and the integrated IMU measurement covariance respectively. The reprojection error is defined as:
(23) 
where is the 2d measurement of the projection of the landmark into camera and its prediction as defined in Eq. (6). The inertial error is obtained by integrating the continuous equations of motion using the sensor models described in Section IIIC and is based on the method described in [18]. The nonlinear objective function is minimized using numerical optimization methods. In our implementation, we use the LevenbergMarquardt implementation of the Ceres framework [34].
V SelfCalibration using Informative Motion Segments
In this section, we propose a method to identify informative segments in a calibration dataset and a modified formulation for estimating calibration parameters based on a set of segments. First, the method can be used to sparsify a dataset and consequently reduce the complexity of the optimization problem. And second, a complete calibration dataset can be built over time by accumulating informative segments from multiple sessions, thus enabling the calibration of even weakly observable parameters by collecting exciting motion that occurs eventually. It is important to note that the proposed method is presented on the usecase of visualinertial calibration but it can be applied to arbitrary calibration problems.
Va Architecture
A highlevel overview of the modules and dataflows is shown in Fig. 4. The proposed method is intended to be run in parallel to an existing visualinertial motion estimation system. The VIO implementation used in this work is described in Section IVB but it is important to note that the method is not tied to a particular motion estimation framework. The keyframe and landmarks states estimated by the VIO module are partitioned into segments. In a next step, the information content of each segment w.r.t. the calibration parameters is evaluated using an efficient information theoretic metric. A database maintains the most informative segments of the trajectory and a calibration is triggered once enough data has been collected. This algorithm is summarized in Alg. 1 and explained in more details in the following sections.
VB Evaluating Information Content of Segments
The continuous stream of keyframe (pose, velocity, bias) and landmark states , estimated by the VIO, is partitioned into motion segments. The th segment is made up by the consecutive keyframes and the set of landmarks observed from this segment.
We propose to use information metrics that only consider the constraints within each segment to evaluate the information content w.r.t. the calibration parameters . Using such an information metric which is independent of all other segments makes its evaluation very efficient at the cost of neglecting crossterms coming from other segments such as loopclosure constraints. However, the neglected constraints can be reintroduced and considered during the calibration. Thus, this assumption only affects the selection of informative segments and potentially leads to a conservative estimate of the actual information but should not bias the calibration results.
To quantify the information content of the th segment , we recover the marginal covariance of the calibration parameters given all the constraints within the segment. For this, we first approximate the covariance over all segment states using the Fisher Information Matrix as:
(24) 
The matrix represents the stacked Jacobians of all error terms and the stacked error covariances corresponding to the errors terms as:
(25) 
where denotes the collection of all states within the segment and the number of errors terms within the segment . Further, the state ordering is chosen such that the rightmost columns of correspond to the states of the calibration parameters .
A rankrevealing QR decomposition is used to obtain
with being the Cholesky decomposition of the error covariance matrix. The Eq. (24) can then be rewritten as(26) 
As is an uppertriangular matrix, we can obtain the marginal covariance efficiently by backsubstitution.
In a next step, we normalize the marginal covariance to account for different scales of the calibration parameters with:
(27) 
where is the expected standard deviation that has been obtained empirically from a set of segments from various datasets. It is important to note, that depends on the sensor setup (e.g. focal length, dimensions, etc.) and should either be reevaluated for each setup or a normalization based on nominal calibration parameters should be performed.
We can now define different information metrics based on the normalized marginal covariance . These metrics will be used to compare segments based on their information content w.r.t. the calibration parameters . They are defined such that a lower value corresponds to more information. In this work, we will investigate the three most common informationtheoretic metrics from optimal design theory:
VB1 AOptimality
This criterion seeks to minimize the trace of the covariance matrix which results in a minimization of the mean variance of the calibration parameters. The corresponding information metric is defined as:
(28) 
VB2 DOptimality
Minimizes the determinant of the covariance matrix which results in a maximization of the differential Shannon information of the calibration parameters.
(29) 
It is interesting to note that this criterion is equivalent to the minimization of the differential entropy
which for Gaussian distributions is defined as:
(30)  
where
is the normalized normal distribution of
and the dimension of this distribution.VB3 EOptimality
This design seeks to minimize the maximal eigenvalue of the covariance matrix with the metric being defined as:
(31) 
VC Collection of Informative Segments
We want to maximize the information contained within a fixedsized budget of segments. For this reason, we maintain a database with a maximum capacity of segments retaining only the most informative segments of the trajectory. The information metric will be used to decide which segments are retained and which are rejected such that the sum over the information metric of all segments in the database is minimized. Such a decision scheme will ensure that the accumulated information on the calibration parameter is increasing over time while the number of segments remains constant. Therefore, an upper bound on the calibration problem complexity can be guaranteed. However, it is important to note that the sum of information metrics is only a conservative approximation of the total information content for two reasons: First, the information metric is only a scalar and therefore no directional information is available. Second, the information metrics neglect any crossterms to other segments and thus underestimates the true information.
VD Segment Calibration Problem
The segmentbased calibration differs from the batch estimator introduced in Section IV in that it only contains the most informative segments of a (multisession) dataset. The removal of trajectory segments from the original problem leads to two main challenges.
First, the time difference between two (temporally neighboring) keyframes could become arbitrarily large when noninformative keyframes have been removed inbetween. An illustration of such a dataset with a temporal gap due to the keyframe removal is shown in Fig. 5 (between keyframe 6/10 and 12/16). In this case, we only constrain the bias evolution between the two neighboring keyframes using a random walk model described in Section IIIC and no constraints are introduced for the remaining keyframe states (pose, velocity).
Second, the removal of noninformative trajectory segments often creates partitions of keyframes that are neither constrained to other partitions through (sufficient) shared landmark observations nor through inertial constraints. Each of these partitions can be seen as a (nearly) independent calibration problem that only shares the calibration states with other partitions. Assuming nondegenerate motion and sufficient visual constraints, each of these partitions contains the 2 structurally unobservable modes of the visualinertial optimization problem namely the rotation around the gravity vector (yaw in global frame) and the global position. These modes are eliminated from the optimization by keeping them constant for exactly one keyframe in each of the partitions to achieve efficient convergence of the iterative solvers.
We identify the partitions based on the covisibility of landmarks and the connectivity through inertial constraints. An overview of the algorithm is shown in Alg. 2. In a first step, all segments that are direct temporal neighbors, and thus connected through inertial constraints, are joined into larger segments (e.g. segment 1 and 2). In a next step, we use a unionfind data structure to iteratively partition the joined segments into disjoint sets (partitions) such that the number of coobserved landmarks between the partitions lies below a certain threshold. At this point, all keyframes within a partition are either constrained through inertial measurements or through sufficient landmark coobservations w.r.t. each other. It is important to note that degenerate landmark configurations are still possible using such a heuristic metric. However, an error will only influence the convergence rate of the incremental optimization but should not bias the calibration results.
Vi Experimental Setup
This section introduces the experiments, datasets, and hardware used to evaluate the proposed method. The results are discussed in the next section.
Via Single/Multi Session Database
We evaluate the proposed method using two different strategies to maintain informative segments in the database. Each strategy is investigated using a set of multisession datasets and discussed along a suitable usecase:
ViA1 Singlesession Database: Observabilityaware Sparsification of Calibration Datasets
Each session starts with an empty segment database and the most informative segments from this single session are kept. After each session, a segmentbased calibration is performed using all the segments in the database and the calibration parameters are updated for use in the next session. This strategy can be seen as an observabilityaware sparsification method for calibration datasets. It is well suited for infrequent and long sessions (e.g. navigation usecase with lots of still phases) where batch calibration over the entire dataset would be too expensive and data selection is necessary.
ViA2 Multisession Database: Accumulation of Information over Time
The multisession strategy does not reset the database between sessions and the most informative segments are collected from multiple consecutive sessions. In contrast to the singlesession strategy, it is particularly suited for frequent and short sessions; for example in an AR/VR usecase where a user performs many short session over a short period of time. It accumulates information from multiple sessions and thus enables the calibration of weakly observable modes which might not be sufficiently excited in a single session.
avg. length  avg. linear /  

dataset  duration  angular vel.  description 
AR/VR usecase:  
office room  23.8 m  0.20 m/s  welllit, good 
(5 sessions)  117.3 s  20.3 deg/s  texture 
class room  37.4 m  0.29 m/s  welllit, open space, 
(5 sessions)  122.9 s  29.62 deg/s  good texture 
Navigation usecase:  
parking garage  168.4 m  0.57 m/s  dark, lowtexture 
(3 sessions)  305.0 s  20.51 deg/s  walls, open space 
office building  164.8 m  0.55 m/s  welllit, good 
(3 sessions)  295.6 s  23.12 deg/s  texture, corridors 
Evaluation datasets:  
Vicon room  59.7 m  0.49 m/s  motioncapture data, 
(15 sessions)  114.1 s  42.95 deg/s  welllit 
ViB Datasets and Hardware
All datasets were recorded using a Google Tango tablet as shown in Fig. 6. This device uses a highfieldofview global shutter camera (10 Hz) and a singlechip MEMS IMU (100 Hz). The measurements of both sensors are timestamped in hardware on a single clock for an accurate synchronization. Additionally, the sensor rig is equipped with markers for external tracking by a Vicon motion capture system. All datasets were recorded on the same device, in a short period of time and while trying to keep the environmental factors constant (e.g. temperature) to minimize potential variations of the calibration parameters across the datasets and sessions.
We have collected datasets representative for each of the two usecases introduced in the previous section in different environments (office, class room, and garage). These datasets consist of multiple sessions that will be used to obtain a calibration using the proposed method. Right after recording the calibration datasets, we have collected a batch of evaluation datasets with motion capture groundtruth. These datasets are used to evaluate the motion estimation accuracy that can be achieved using the obtained calibration parameters. An overview of all datasets and their characteristics is shown in Table II and Fig. 7.
While recording the calibration datasets, we tried to achieve the following characteristics representative for the two usecases:
ViB1 AR/VR usecase
We collected datasets that mimic an AR/VR usecase to evaluate whether we can accumulate information from multiplesessions (multisession database strategy). Characteristic of this usecase, the datasets consists of multiple short sessions restricted to a small indoor space (single room), containing mostly fast rotations, only slow and minor translation and stationary phases. Two datasets have been recorded in a class and office room each containing sessions that are min long.
ViB2 Navigation usecase
In contrast to the AR/VR usecase, the navigation sessions contain mostly translation over an area of multiple rooms and only slow rotations but also contain stationary and rotationonly phases. Datasets have been recorded in two locations: garage and office  each contains sessions with a duration of min. These datasets will be used to evaluate the observabilityaware sparsification (singlesession database strategy).
ViC Evaluation Method
For performance evaluation, we calibrate the sensor models on each session of the dataset in temporal order where we use the calibration parameters obtained from the previous session as initial values. The first session uses a nominal calibration consisting of a relative pose between camera and IMU from CAD values, nominal values for the IMU intrinsics (unit scale factors, no axis misalignment) and camera intrinsics.
This calibration scheme is performed for all datasets and for both of the database strategies to obtain a set of calibration parameters for each session. The quality of the obtained calibration parameters is then evaluated using the following methods:
ViC1 Motion estimation performance
As the main objective of our work is to calibrate the sensor system for egomotion estimation, we use the accuracy of the motion estimation (based on our calibrations) as the main evaluation metric. We run all
evaluation datasets for each set of calibration parameters and evaluate the accuracy of the estimated trajectory against the groundtruth from the motioncapture system.The motion estimation error is obtained by first performing a spatiotemporal alignment of the estimated and the groundtruth trajectory. Second, a relative pose error is computed at each timestep between the two trajectories. To compare different runs, we use the rootmeansquare error (RMSE) calculated over all the relative pose errors.
ViC2 Parameter repeatability
We only evaluate the parameter repeatability over different calibrations of the same device as no groundtruth for the calibration parameters is available. We have recorded all dataset close in time while keeping the environmental conditions (e.g. temperature) similar and avoiding any shocks to minimize potential variations of the calibration parameters between the datasets.
Vii Results and Discussion
In this section, we discuss the results of our experiments (Section VI) along the following questions:

Section VIIA: How accurate are motion estimates based on calibrations derived only from informative segments? How does it compare to the nonsparsified (batch) calibration?

Section VIIB: Does the sparsified calibration yield similar calibration parameters to the (full) batch problem?

Section VIID: Can we accumulate informative segments from multiple sessions and perform a calibration where the individual session would not provide enough excitation for a reliable calibration?

Section VIIC: How do the three different information metrics compare? Can we outperform random selection of segments?

Section VIIF: What segments are being selected as informative? What are their properties?

Section VIIG: How do we select the number of segments to retain in the database?
RMSE on VIO trajectory vs. motion capture groundtruth (translation [cm] / rotation [deg])  
initial  no sparsification  sparsified (8 segments, each 4 seconds)  
calibration  (batch)  Eoptimality  Doptimality  Aoptimality  random  joint EKF  
AR/VR: office room  13.07  9.10 cm  1.50  0.89 cm  1.62  0.60 cm  1.76  0.59 cm  1.79  0.62 cm  3.99  2.49 cm  1.86  1.17 cm  
1.18  0.60 deg  0.47  0.26 deg  0.34  0.12 deg  0.37  0.13 deg  0.35  0.13 deg  0.64  0.34 deg  0.49  0.27 deg  
AR/VR: class room  13.09  9.13 cm  1.41  1.07 cm  1.79  0.75 cm  1.28  0.54 cm  1.42  0.57 cm  5.45  5.81 cm  2.44  1.71 cm  
1.17  0.59 deg  0.46  0.25 deg  0.35  0.12 deg  0.34  0.12 deg  0.35  0.12 deg  0.77  0.54 deg  0.52  0.32 deg  
NAV: parking garage  13.09  9.13 cm  4.66  34.73 cm  1.65  0.56 cm  2.14  1.03 cm  1.59  0.59 cm  4.97  3.56 cm  3.04  1.81 cm  
1.17  0.59 deg  0.57  0.62 deg  0.31  0.11 deg  0.38  0.14 deg  0.31  0.11 deg  0.73  0.43 deg  0.55  0.29 deg  
NAV: office building  13.13  9.17 cm  1.86  1.17 cm  1.68  0.62 cm  1.39  0.49 cm  1.26  0.45 cm  2.32  1.18 cm  2.56  1.60 cm  
1.16  0.57 deg  0.51  0.27 deg  0.41  0.14 deg  0.34  0.12 deg  0.35  0.12 deg  0.50  0.27 deg  0.60  0.35 deg 
Viia Motion Estimation Performance using the Observabilityaware Sparsification (Singlesession Database)
In this experiment, we use a database of segments ( seconds each) which leads to a reduction of the sessions size by around % in the AR/VR usecase and % in the navigation usecase. To evaluate the observabilityaware sparsification, we select the most informative segments for all sessions of a dataset independently. A segmentbased calibration is then run over the selected segments to obtain an updated set of calibration parameters for each session. Finally, the VIO motion estimation accuracy is evaluated for each calibration on all of the evaluation datasets as described in Section VIC1. The resulting statistics of the RMSE are shown in Table III for each dataset. The mean of rotation states corresponds to the rotation angle of the averaged quaternion as described in [35] and the standard deviation is derived from rotation angles between the samples and the averaged quaternion. For comparison, the same evaluations have been performed for the initial and batch calibration (no sparsification).
The calibrations obtained with the sparsified dataset yield very similar motion estimation performance when compared to full batch calibrations. This indicates that the proposed method can indeed sparsify the calibration problem while retaining the relevant portion of the dataset and still provide a calibration with motion estimation performance close to the nonsparsified problem. It is interesting to note, that the sparsification to a fixed number of segments keeps the calibration problem complexity bounded while the complexity of the batch problem is (potentially) unbounded when used on large datasets with redundant and noninformative sections.
ViiB Repeatability of Estimated Calibration Parameter
As we have no groundtruth for the calibration parameters, we can only evaluate their repeatability across multiple calibrations of the same device. The statistics over all calibration parameters obtained with all sessions of the class room datasets are shown in Table IV. We used the same sparsification parameters as in Section VIIA ( segments, each seconds).
The experiments show that the deviation between the fullbatch and sparsified solution remain insignificant in mean and standard deviation even though % of the trajectory has been removed. This is a good indication that the sparsified calibration problem is a good approximation to the complete problem.
parameter  proposed method  batch  joint EKF  

(sparsified)  (complete dataset)  (complete dataset)  
[px]  255.79  0.60  256.30  0.22    
255.68  0.67  256.31  0.27    
[px]  313.63  0.67  313.19  0.63    
241.62  1.17  243.16  0.18    
[]  0.9203  0.0009  0.9208  0.0008    
[]  2.82e03  1.32e03  2.11e03  2.27e04  2.39e03  2.06e03  
4.33e03  4.83e03  4.02e03  2.70e04  7.71e03  3.08e03  
1.21e03  5.18e04  1.54e03  4.18e04  2.61e03  3.90e03  
[]  9.70e03  1.50e02  1.85e02  3.07e03  1.64e02  6.54e03  
1.16e02  1.17e02  1.65e02  1.19e03  1.24e02  5.59e03  
1.95e02  7.38e03  1.86e02  1.48e03  1.34e02  2.43e03  
[]  3.22e04  1.69e03  7.36e04  6.56e04  1.03e03  8.78e04  
2.37e03  1.95e03  3.96e04  2.30e04  7.36e04  1.32e03  
6.78e04  1.60e03  4.95e05  1.17e03  9.82e04  1.77e03  
[deg]  1.897  0.428  1.504  0.010  1.368  0.150  
[]  2.11e02  1.11e02  1.35e02  1.54e03  1.68e02  5.05e03  
3.68e02  1.11e02  2.78e02  2.59e03  2.76e02  6.78e03  
7.93e03  9.30e03  3.19e03  1.21e03  7.92e04  2.99e03  
[m]  1.06e03  4.01e03  4.93e03  2.33e03  5.43e03  3.68e03  
4.62e03  1.86e02  7.05e04  2.17e03  4.09e04  2.85e03  
1.48e02  1.12e02  6.09e03  4.08e03  1.19e02  6.77e03  
[deg]  1.174  0.133  1.065  0.071  0.753  0.069 
ViiC Comparison of Information Metrics
In Section VB, we have proposed three different information metrics to compare trajectory segments for their information w.r.t. to the calibration parameters. The same evaluation performed for the sparsification usecase (Section VIIA) has been repeated for each of the proposed metrics and, as a baseline, also for calibrations based on randomly selected segments. The motion estimation errors based on these calibration is reported in Table III.
The motion estimation error is around  times larger when randomly selecting the same amount of data indicating that the proposed metrics successfully identify informative segments for calibration. It is important to note, that this comparison heavily depends on the ratio of informative / noninformative motion in the dataset and therefore this error might be larger when there is less excitation in a given dataset. In general, all three metrics show comparable performance, however, the Aoptimality criteria performed slightly better on the navigation and the Doptimality on the AR/VR usecase.
ViiD Accumulation of Information over Time: Single vs Multisession Database
In this section, we evaluate whether the proposed method can accumulate informative segments from multiple consecutive sessions to obtain a better and more consistent calibration than the individual session would yield. This is especially important in scenarios where a single session often would not provide enough excitation for a reliable calibration. The evaluations were performed on the AR/VR usecase datasets which consist of multiple short sessions. We use the Aoptimality criteria to select the most informative segments of each sessions and maintain them in the database ( segments, seconds). In contrast to the sparsification usecase from Section VIIA, the database is not reset between the sessions. In other words, the database will collect the most informative segments from the first up to the current session. After each session, a calibration is triggered using all segments of the database. These calibrations are then used to evaluate the motion estimation error on all evaluation datasets. The results are shown in Fig. 8 for the class room dataset.
The evaluation shows that the motion estimation error decreases as the number of sessions increases (from which informative segments have been selected). Further, the motion estimation error is smaller when compared to calibrations based on the most informative segments from individual sessions. After around sessions the estimation performance is close to what would be achieved using a batch calibration. This indicates that the proposed method can accumulate information from multiple sessions while the number of segments in the database remains constant. It can therefore provide a reliable calibration when a single session would not provide enough excitation.
ViiE Comparison vs. joint Ekf
In this section, we compare the proposed method against an EKF filter that jointly estimates the motion, scene structure and the calibration parameters (similar to [18]). In our implementation, we only estimate the IMU intrinsics and the relative pose between the camera and IMU. The camera intrinsics are not estimated and set to parameters obtained with a batch calibration on the same dataset.
We evaluated the motion estimation errors on all datasets and report the results in Table III. The resulting calibration parameters are compared to the proposed method and batch solution in Table IV. The evaluations show a position error that is up to times larger compared to calibrations obtained with the proposed method or a batch calibration. When looking at the state evolution of e.g. the misalignment factors, as shown in Fig. 9 for one of the datasets, it can be seen that it converges roughly to the batch estimate but does not remain stable over time. We see this as an indication that the local scope of the EKF is not able to infer weakly observable states properly and thus a segmentbased (slidingwindow) approach is beneficial in providing a stable and consistent solution over time.
ViiF Selected Informative Segments
In this section, we investigate the motion that is being selected as informative by the proposed method. Fig. 10 shows the most informative segments that have been selected in one of the session of the navigation usecase. We only show the first minute of the session as otherwise the trajectory would start to overlap. It can be seen that the information metric correlates with changes in linear and rotational velocity and therefore mostly segments containing turns have been selected while straight segments have been found to be less informative. This experiment seems to confirm the intuition that segments with larger accelerations and rotational velocities are more informative for calibration.
ViiG Influence of Database Size on the Calibration Quality
In this experiment, we investigate the effect of the database size on the calibration quality to find the minimum amount of data required for a reliable calibration. We sparsify all sessions of all datasets repeatably to retain to of the most informative segments. A segmentbased calibration is then run on each of the sparsified datasets and the motion estimation error is evaluated on all evaluation datasets. The segment duration was chosen as seconds from geometrical considerations such that segments span a sufficiently large distance for landmark triangulation with the assumption that the system moves at a steady walking speed. The median of the RMSE over all evaluation datasets is shown in Fig. 11.
The motion estimation error seems to stabilize when using more than segments. Based on these experiments, we have selected a database size of segments as a reasonable tradeoff between calibration complexity and quality and used this value for all the evaluations in this work. It is important to note, that the amount of data required for a reliable calibration depends on the sensor models, the expected motion and the environment and a reevaluation might become necessary if these parameters change. In future work, we plan to investigate methods to determine the information content of the database directly to avoid a selection of this parameter.
ViiH Runtime
Table V reports the measured runtimes of the proposed method and the batch calibration for the experiments of Section VIIA. Both optimizations use the same number of steps and the same initial conditions.
It is important to note, that the complexity and thus runtime of the batch method is unbounded when the duration of the sessions increase. The runtime of the proposed method, however, remains constant as we only include a constant amount of informative data. This property makes the proposed method wellsuited for systems performing long sessions.

batch 



VIO (each image)  0.003 s    0.003 s  
Data selection (each segment)  0.156 s      
Calibration (each dataset)  12.050 s  27.028 s   
Viii Conclusion
We have proposed an efficient selfcalibration method for visual and inertial sensors which runs in parallel to an existing motion estimation framework. In a background process, an informationtheoretic metric is used to quantify the information content of motion segments and a fixed number of the most informative are maintained in a database. Once enough data has been collected, a segmentbased calibration is triggered to update the calibration parameters. With this method, we are able to collect exciting motion in a background process and provide reliable calibration with the assumption that such motion occurs eventually  making this method wellsuited for consumer devices where the users often do not know how to excite the system properly.
An evaluation on motion capture groundtruth shows that the calibrations obtained with the proposed method achieve comparable motion estimation performance to full batch calibrations. However, we can limit the computational complexity by only considering the most informative part of a dataset and thus enable calibration even on long sessions and resourceconstrained platforms where a fullbatch calibration would be unfeasible. Further, our evaluations show that we can not only sparsify singlesession datasets but also accumulate information from multiple sessions and thus perform reliable calibrations when a singlesession would not provide enough excitation. The comparison of three information metrics indicates that Aoptimality could be selected for navigation purposes while Doptimality looks like a good compromise for AR/VR applications.
In future work, we would like to investigate methods to dynamically determine the segment boundaries instead of using a fixed segment length and also account for temporal variations in the calibration parameters by detecting and removing outdated segments from a database.
Acknowledgements
We would like to thank Konstantine Tsotsos, Michael Burri and Igor Gilitschenski for the valuable discussions and inputs. This work was partially funded by Google’s Project Tango.
References
 Rehder et al. [2016] J. Rehder, J. Nikolic, T. Schneider, T. Hinzmann, and R. Siegwart, “Extending kalibr: Calibrating the extrinsics of multiple imus and of individual axes,” in IEEE International Conference on Robotics and Automation (ICRA), 2016, pp. 4304–4311.
 Schneider et al. [2017] T. Schneider, M. Li, M. Burri, J. Nieto, R. Siegwart, and I. Gilitschenski, “Visualinertial selfcalibration on informative motion segments,” in IEEE International Conference on Robotics and Automation (ICRA), 2017, pp. 6487–6494.
 Leutenegger et al. [2015] S. Leutenegger, S. Lynen, M. Bosse, R. Siegwart, and P. Furgale, “Keyframebased visual–inertial odometry using nonlinear optimization,” The International Journal of Robotics Research, vol. 34, no. 3, pp. 314–334, 2015.
 Mourikis and Roumeliotis [2007] A. I. Mourikis and S. I. Roumeliotis, “A multistate constraint kalman filter for visionaided inertial navigation,” in Proceedings 2007 IEEE International Conference on Robotics and Automation, 2007, pp. 3565–3572.
 Bloesch et al. [2015] M. Bloesch, S. Omari, M. Hutter, and R. Siegwart, “Robust visual inertial odometry using a direct ekfbased approach,” in Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ International Conference on. IEEE, 2015, pp. 298–304.
 Qin et al. [2018] T. Qin, P. Li, and S. Shen, “Vinsmono: A robust and versatile monocular visualinertial state estimator,” IEEE Transactions on Robotics, vol. 34, no. 4, pp. 1004–1020, 2018.
 Schneider et al. [2018] T. Schneider, M. T. Dymczyk, M. Fehr, K. Egger, S. Lynen, I. Gilitschenski, and R. Siegwart, “maplab: An open framework for research in visualinertial mapping and localization,” IEEE Robotics and Automation Letters, 2018.

Hartley and Zisserman [2003]
R. Hartley and A. Zisserman,
Multiple view geometry in computer vision
. Cambridge university press, 2003.  Alves et al. [2003] J. Alves, J. Lobo, and J. Dias, “Camerainertial sensor modelling and alignment for visual navigation,” Machine Intelligence and Robotic Control, vol. 5, no. 3, pp. 103–112, 2003.
 Lobo and Dias [2007] J. Lobo and J. Dias, “Relative pose calibration between visual and inertial sensors,” The International Journal of Robotics Research, vol. 26, no. 6, pp. 561–575, 2007.
 Mirzaei and Roumeliotis [2008] F. M. Mirzaei and S. I. Roumeliotis, “A kalman filterbased algorithm for imucamera calibration: Observability analysis and performance evaluation,” IEEE Transactions on Robotics, vol. 24, no. 5, pp. 1143–1156, 2008.
 Zachariah and Jansson [2010] D. Zachariah and M. Jansson, “Joint calibration of an inertial measurement unit and coordinate transformation parameters using a monocular camera,” in International Conference on Indoor Positioning and Indoor Navigation (IPIN), 2010, pp. 1–7.
 Furgale et al. [2013] P. Furgale, J. Rehder, and R. Siegwart, “Unified temporal and spatial calibration for multisensor systems,” in International Conference on Intelligent Robots and Systems (IROS). IEEE, 2013, pp. 1280–1286.
 Krebs [2012] C. Krebs, “Generic imucamera calibration algorithm: Influence of imuaxis on each other,” Autonomous Systems Lab, ETH Zurich, Tech. Rep, 2012.
 Nikolic et al. [2016] J. Nikolic, M. Burri, I. Gilitschenski, J. Nieto, and R. Siegwart, “Nonparametric extrinsic and intrinsic calibration of visualinertial sensor systems,” IEEE Sensors Journal, vol. 16, no. 13, pp. 5433–5443, 2016.
 Kelly and Sukhatme [2011] J. Kelly and G. S. Sukhatme, “Visualinertial sensor fusion: Localization, mapping and sensortosensor selfcalibration,” The International Journal of Robotics Research, vol. 30, no. 1, pp. 56–79, 2011.
 PatronPerez et al. [2015] A. PatronPerez, S. Lovegrove, and G. Sibley, “A splinebased trajectory representation for sensor fusion and rolling shutter cameras,” International Journal of Computer Vision, vol. 113, no. 3, pp. 208–219, 2015.
 Li et al. [2014] M. Li, H. Yu, X. Zheng, and A. I. Mourikis, “Highfidelity sensor modeling and selfcalibration in visionaided inertial navigation,” in IEEE International Conference on Robotics and Automation (ICRA), 2014, pp. 409–416.
 Li and Mourikis [2014] M. Li and A. I. Mourikis, “Online temporal calibration for camera–imu systems: Theory and algorithms,” The International Journal of Robotics Research, vol. 33, no. 7, pp. 947–964, 2014.
 Richardson et al. [2013] A. Richardson, J. Strom, and E. Olson, “Aprilcal: Assisted and repeatable camera calibration,” in IEEE International Conference on Intelligent Robots and Systems (IROS), 2013, pp. 1814–1821.
 Bähnemann et al. [2017] R. Bähnemann, M. Burri, E. Galceran, R. Siegwart, and J. Nieto, “Samplingbased motion planning for active multirotor system identification,” in IEEE International Conference on Robotics and Automation (ICRA), 2017, pp. 3931–3938.
 Hausman et al. [2017] K. Hausman, J. Preiss, G. S. Sukhatme, and S. Weiss, “Observabilityaware trajectory optimization for selfcalibration with application to uavs,” IEEE Robotics and Automation Letters, 2017.
 Preiss et al. [2017] J. A. Preiss, K. Hausman, G. S. Sukhatme, and S. Weiss, “Trajectory optimization for selfcalibration and navigation,” in Robotics: Science and Systems (RSS), 2017.
 Maye et al. [2013] J. Maye, P. Furgale, and R. Siegwart, “Selfsupervised calibration for robotic systems,” in IEEE Intelligent Vehicles Symposium (IV), 2013, pp. 473–480.
 Keivan and Sibley [2014] N. Keivan and G. Sibley, “Constanttime monocular selfcalibration,” in International Conference on Robotics and Biomimetics (ROBIO), 2014, pp. 1590–1595.
 Nobre et al. [2016] F. Nobre, C. R. Heckman, and G. T. Sibley, “Multisensor slam with online selfcalibration and change detection,” in International Symposium on Experimental Robotics. Springer, 2016, pp. 764–774.
 Nobre et al. [2017] F. Nobre, M. Kasper, and C. Heckman, “Driftcorrecting selfcalibration for visualinertial slam,” in IEEE International Conference on Robotics and Automation (ICRA), 2017, pp. 6525–6532.
 Li and Mourikis [2013] M. Li and A. I. Mourikis, “Highprecision, consistent EKFbased visual–inertial odometry,” The International Journal of Robotics Research, vol. 32, no. 6, pp. 690–711, 2013.
 Trawny and Roumeliotis [2005] N. Trawny and S. I. Roumeliotis, “Indirect kalman filter for 3d attitude estimation,” University of Minnesota, Dept. of Comp. Sci. & Eng., Tech. Rep, vol. 2, 2005.
 Zhang et al. [2016] Z. Zhang, H. Rebecq, C. Forster, and D. Scaramuzza, “Benefit of large fieldofview cameras for visual odometry,” in IEEE International Conference on Robotics and Automation (ICRA), 2016, pp. 801–808.
 Devernay and Faugeras [2001] F. Devernay and O. Faugeras, “Straight lines have to be straight,” Machine vision and applications, vol. 13, no. 1, pp. 14–24, 2001.
 Woodman [2007] O. J. Woodman, “An introduction to inertial navigation,” University of Cambridge, Computer Laboratory, Tech. Rep. UCAMCLTR696, Aug. 2007. [Online]. Available: https://www.cl.cam.ac.uk/techreports/UCAMCLTR696.pdf
 Zhang [2000] Z. Zhang, “A flexible new technique for camera calibration,” TPAMI, vol. 22, no. 11, pp. 1330–1334, 2000.
 [34] S. Agarwal, K. Mierle, and Others, “Ceres solver,” http://ceressolver.org.
 Markley et al. [2007] F. L. Markley, Y. Cheng, J. L. Crassidis, and Y. Oshman, “Averaging quaternions,” Journal of Guidance, Control, and Dynamics, vol. 30, no. 4, pp. 1193–1197, 2007.
Comments
There are no comments yet.