Learning Camera Miscalibration Detection

by   Andrei Cramariuc, et al.
ETH Zurich

Self-diagnosis and self-repair are some of the key challenges in deploying robotic platforms for long-term real-world applications. One of the issues that can occur to a robot is miscalibration of its sensors due to aging, environmental transients, or external disturbances. Precise calibration lies at the core of a variety of applications, due to the need to accurately perceive the world. However, while a lot of work has focused on calibrating the sensors, not much has been done towards identifying when a sensor needs to be recalibrated. This paper focuses on a data-driven approach to learn the detection of miscalibration in vision sensors, specifically RGB cameras. Our contributions include a proposed miscalibration metric for RGB cameras and a novel semi-synthetic dataset generation pipeline based on this metric. Additionally, by training a deep convolutional neural network, we demonstrate the effectiveness of our pipeline to identify whether a recalibration of the camera's intrinsic parameters is required or not. The code is available at http://github.com/ethz-asl/camera_miscalib_detection.


page 1

page 3


CalQNet – Detection of Calibration Quality for Life-Long Stereo Camera Setups

Many mobile robotic platforms rely on an accurate knowledge of the extri...

Learning Precise 3D Manipulation from Multiple Uncalibrated Cameras

In this work, we present an effective multi-view approach to closed-loop...

Robot Perception of Static and Dynamic Objects with an Autonomous Floor Scrubber

This paper presents the perception system of a new professional cleaning...

Intrinsic Calibration of Depth Cameras for Mobile Robots using a Radial Laser Scanner

Depth cameras, typically in RGB-D configurations, are common devices in ...

Robust Intrinsic and Extrinsic Calibration of RGB-D Cameras

Color-depth cameras (RGB-D cameras) have become the primary sensors in m...

TSAMT: Time-Series-Analysis-based Motion Transfer among Multiple Cameras

Along with advances in optical sensors is the common practice of buildin...

A Metric for Performance Portability

The term "performance portability" has been informally used in computing...

Code Repositories


Carnival Mirror is a tool to simulate pinhole camera miscalibrations

view repo

I Introduction

In robotics, errors in the estimation of the system’s parameters can adversely affect the accuracy of algorithms for state estimation and the performance of feedback controllers. In order to avoid systematic errors due to incorrect parameter estimates, a common practice is to perform sophisticated calibration of the system by a human expert 

[18]. Once determined, calibration parameters are kept fixed during the operation cycle of the robot. However, this approach is not sustainable for a variety of real-world applications where robots need to operate in harsh environments for extended periods of time. The calibration parameters of the system are prone to change over time due to component wear, environmental transients such as temperature changes, or external disturbances like collisions. Additionally, it may be impractical to perform offline calibration regularly as a means to address this issue.

An alternative solution to offline calibration is online calibration techniques that are performed during the system’s normal operation, such as those presented in [17, 21]. These techniques, though promising, are computationally expensive and have various limiting requirements, such as the type of required motion or the storage and processing of data to create a calibration dataset. Hence, instead of running these methods periodically and recalibrating the robotic platform, ideally, one would like to perform the calibration only when the system is detected to be miscalibrated. This objective is considered as a constituent of the fault detection and diagnosis for a robotic system [26].

Since sensors lie at the core of any autonomous system, it is critical to detect the sensor data faults for safety and stable performance [14]. However, unlike sensors for measuring attitude or temperature, calibration errors in vision sensors do not appear as an offset or as a drift in the sensor’s readings. Although having hardware redundancy is a way to detect imperfections, it increases the cost and complexity of the system. Further, due to their complex nature, it is difficult to obtain a unified analytical solution for identifying miscalibration in vision sensors. Fortunately, common operating environments, both indoors and outdoors, contain regularities that can be exploited for this purpose, such as walls, furniture, street lamps, etc. We propose a data-driven approach that implicitly utilizes these regularities.

Fig. 1: Illustration of the subtle differences that a miscalibration detection system needs to be sensitive to. Top left: Unrectified image; Top right: Correctly rectified image; Bottom: Two examples of incorrectly rectified images. The image is taken from the KITTI dataset [5].

In this work, our goal is not to provide a neural network that detects miscalibrations for any camera. Instead, we propose a method in which a network is tuned for a specific camera to predict when an automatic recalibration is necessary for that camera. However, using a learning-based approach poses its own set of challenges. First, a large-scale dataset for training a network to detect miscalibration is not currently available in the public domain. Second, there is no standard metric for measuring the degree of miscalibration in the intrinsics of a camera. We address these challenges and provide the following key contributions:

  • A novel dataset generation pipeline to create a large-scale dataset for camera miscalibration detection.

  • A metric, average pixel position difference, for estimating the degree of miscalibration and analysis of how it correlates with performance in a monocular odometry task.

  • A deep convolutional neural network (CNN) that predicts when the camera is miscalibrated even in previously unseen scenes.

Ii Related Work

In the last decade, a variety of approaches have been proposed for calibration of various types of range-based sensors, inertial sensors, and vision sensors, as well as the extrinsic calibration between them. In this section, we focus on literature related to sensor miscalibration, the estimation of intrinsic parameters of vision sensors, and fault detection in multi-sensor systems.

Accurate camera calibration is an important step for a multitude of 3D computer vision tasks. The existing calibration techniques can be broadly categorized as photogrammetric calibration and self-calibration. In photogrammetric calibration, the camera calibration is performed by observing a target of known geometry in 3D space. Over the years, various types of tags such as checkerboards 

[32, 20] and fiducial markers [3] have been proposed for this purpose. These approaches typically pose the calibration problem as a non-linear optimization problem to minimize a reprojection error and to estimate the most likely values of the camera parameters. However, the need to have an apparatus and a human expert in these techniques prevents them from being scalable or practical for robots deployed into the real world.

On the other hand, self-calibration, as introduced in [11], does not require a calibration object. Through a sequence of images, these methods estimate the intrinsic parameters that are consistent with the underlying projective reconstruction of the observed scene. Certain approaches use camera motion constraints, such as planar motion [2] or rotation of the camera [1], in conjunction with the 3D metric reconstruction of the scene to calibrate the camera’s intrinsic parameters. Sturm [24] presents the concept of critical motion sequences for a camera with constant parameters for which there exists no unique solution for self-calibration. Wildenauer and Hanbury [27] detect orthogonal vanishing points in the scene to generate a hypothesis for focal length. However, the flexibility provided by self-calibration techniques comes at the price of computation expenses.

More recently, with the advent of deep learning 

[7], data-driven approaches have also been proposed to estimate the calibration parameters of the camera. Workman et al. [29] propose a CNN for estimating the focal length of an image. To train the network, they construct a dataset by combining images and camera models estimated using 1D structure from motion [28]. On the other hand, Lopez et al. [9] use separate regressors, which share a common pre-trained network architecture, to estimate tilt, roll, focal length, and radial distortion parameters from a single image. They use the SUN360 panorama dataset [30] to artificially generate the training images. However, the estimation of these parameters is highly dependent on finding the horizon in the image, an assumption that is highly environment dependent. Unlike the previous two approaches, which aim to estimate the camera calibration parameters, Yin et al. [31] propose an end-to-end multi-context deep network for removing distortions from single fish-eye camera images. They use a scene-parsing network to provide semantic cues during training and use an reconstruction loss for rectified image prediction.

Fault detection in multi-sensor systems can be done by correlating the information from multiple sensors and rejecting measurements that do not match [12, 25, 19, 10, 22]. These methods are generally able to detect when a fault has occurred, however they rely on a redundant sensor setup. When the fault estimation is done indirectly, through an intermediary task such as localization performance, it can be ambiguous to decide whether the sensor is at fault or if the localization system failed.

While some of the above mentioned approaches deal with estimating the calibration parameters through either a geometry-based or a learning-based approach, our work is orthogonal and does not aim to replace them. We want to complement these methods by identifying when a camera needs to be recalibrated. Further, we do not want to rely on sensor redundancy since that increases the cost of the system. Thus, our objective is to detect miscalibration in a single camera. To the best of our knowledge, this is the first work proposing a deep learning approach to detect miscalibration of the intrinsic parameters for an RGB camera.

Iii Methodology

In this section, we present our contributions in detail. The dataset generation pipeline is explained in Section III-A. In Section III-B, our novel metric for miscalibration is defined. The neural network and its training are detailed in Section III-C.

Iii-a Dataset Generation

A straightforward procedure to create a dataset for camera miscalibration detection is to manually vary the camera parameters by using different lenses while taking any one of the settings as the nominal one. However, this process is time-consuming and tedious since offline calibration would be required for every new setting. The procedure is also limited with respect to the generation of disturbances in the camera parameters. Some cameras have only one degree of freedom for calibration (the distance between the lens and the sensor), hence the calibration parameters cannot be varied independently. Due to these limitations, we propose an alternative solution to generate a semi-synthetic dataset by using a set of raw images and a set of correct calibration parameters for a given camera setup. The presented method is based on the idea that the visual effect obtained from rectifying an image from a miscalibrated sensor with its initial belief of the parameters is similar to the effect of rectifying an image from a calibrated sensor with parameters different from the correct ones.

In our semi-synthetic dataset generation pipeline, we consider the pinhole camera model with radial and tangential distortion [8]. We denote the set of true calibration parameters of the camera model as . Consider the raw camera image , which is rectified using the parameters to obtain the rectified image . The rectification map used in this process relates each pixel in the rectified image to a position in the original image. Generally, not all the pixels in the rectified image have a corresponding position in the original one. Therefore, we define a validity mask, , which is the largest rectangular region in the rectified image with only valid pixels, and that has the same aspect ratio as the original image . The final sample image is obtained by first cropping the valid mask region of the image and then rescaling the result to the size of the original raw image . An alternate way to express the rectification is by applying the rectification map (which is obtained by cropping and rescaling ) on the raw image to directly obtain the final sample image .

In general, it is difficult to obtain the true calibration parameters of the camera. Thus, we use the values estimated using a calibration toolbox as the correct calibration parameters and denote the correct rectified image and rectification map as and respectively. To obtain samples of miscalibrated images, we perturb each intrinsic parameter independently to obtain . This process allows generating arbitrarily many miscalibrated images, , and rectification maps, , by randomly perturbing the parameters. Thus, by collecting only a set of raw images with a correct calibration of the sensor, one can generate a large amount of data for detecting camera miscalibration. Even though we consider a pinhole camera model, this approach is also applicable to other camera models.

Fig. 2: Top: An unrectified image from the KITTI dataset [5]. Bottom: For illustration purposes, canny edges detected from correctly (in green) and incorrectly (in red) rectified images are shown. A set of incorrect rectification parameters results in pixel projections from the raw image to be displaced relative to that with the correct parameters (indicated by the black segments). The mean of the L2-norms of these displacements over the image corresponds to the APPD.
Fig. 3:

The network architecture used to run the experiments. All layers except the last one use Rectified Linear Unit (ReLU) activation functions.

Iii-B Metric for Degree of Miscalibration

As described in Section III-A, image rectification is a transformation parameterized by the calibration parameters. Since these parameters are continuous, one can generate images arbitrarily close to a correctly rectified image by applying small perturbations to the true calibration parameters. Due to the non-linear effects and the strong correlations of these parameters on the rectification transformation, defining a meaningful distance metric to directly assess the quality of different randomly chosen calibrations is difficult. Moreover, as the rectification of an input image is typically only the first stage of a system, the degree of miscalibration should be considered in conjunction with the corresponding reduction in the overall system performance. Therefore, we propose an indirect approach using the average pixel position difference (APPD) as a scalar metric to measure the degree of camera miscalibration.

Using the symbols introduced in Section III-A, the numeric value for the APPD, denoted by , is calculated using the rectification maps and obtained from using calibration parameters and respectively. Since these maps are computed using different parameters, they relate the same pixel coordinate in their corresponding rectified images to a different image coordinate in the raw image. The Euclidean distance between these two positions is referred to as the pixel position difference. This is illustrated in Figure 2. The APPD is the mean value of these pixel position differences over the entire image, i.e.

where is the size of the image and denotes the pixel coordinate . Even when normalized by the number of pixels, the value still depends on the resolution. Normalizing further by the diagonal makes it resolution-independent. That is why we report APPD values as a percentage of the image diagonal, i.e. they are divided by the diagonal and scaled by a factor of 100.

Fig. 4: APPD prediction accuracy of the trained neural network models for the two RGB cameras from KITTI [5]. The plots show the distributions of networks predictions for given quantized APPD values. The dashed line designates perfect prediction.

Iii-C Network Architecture and Training

The architecture of the APPD prediction network is presented in Figure 3. The input to the network is the rectified image , and the output is the APPD metric. To prevent artifacts and loss of minute details due to image resizing, we use the input at full resolution.

For each camera and corresponding correct calibration, we train a separate network to deploy alongside the respective camera. This can be seen as an addition to the calibration procedure that, in a similar manner, is also pre-computed for each camera separately. The goal of the dataset generation process described in Section III-A is to reduce the amount and variety of data required to train the model. With the proposed method, it is sufficient to collect a single dataset, with correct calibration known and without any manual labeling.

During training, the perturbed parameters are sampled such that the calculated APPD values follow an approximately uniform distribution. Additionally, 1% of the samples are kept with the correct rectification, i.e. APPD value of zero. We use a mean squared error loss between the network predictions and the ground-truth labels for training. This loss is optimized by using the Adaptive Moment Estimation (ADAM) optimizer 

[16]. We initialize the network parameters by Xavier’s initialization method [6] and use dropout [23] to avoid overfitting.

Iv Experiments and Discussion

The results from evaluating the trained models are discussed in Section IV-A. We present some generalization results for our approach in Section IV-B. The relationship between APPD and the intrinsic parameters is experimentally investigated in Section IV-C. Section IV-D compares APPD and reprojection error.

Fig. 5: Plots showing the different effects of the camera’s intrinsic parameters on the APPD, when one parameter is varied, and the rest are kept fixed. The x-axis is the multiplication factor applied to the reference parameter. The used reference calibration is from camera 2 for the KITTI sequences from date . Note that the y-axis scales in the plots for and , as well as for and , are different. This is due to the image’s aspect ratio.

Iv-a Detection of Miscalibrations with a Neural Network

The KITTI dataset has two RGB cameras (cameras 2 and 3). We split the KITTI sequences from September 26, 2011 to obtain our training and validation sets. For testing, we use all sequences from the other four days. We vary the focal lengths from to , the optical center , and the distortion coefficients . The dataset provides different calibration files for every day, which were observed to be inconsistent. It is not known whether the cameras differed physically on different days, or if the differences in the calibrations arise from imperfections in the calibration procedure. Therefore, there is no single ‘correct’ calibration that can be used as a reference for calculating the true APPD value when evaluating prediction performance. Instead, we consider two cases: (i) taking the set of parameters corresponding to the day used for training, and (ii) using the set corresponding to the day on which the test image was actually recorded.

Figures 4a and 4b show the prediction quality of the trained networks for both camera 2 and camera 3, evaluated for the two cases described above. The mean absolute error (MAE) for each case is also reported. It can be seen that the models are able to generalize also to images and environments they have not seen before. While both networks are powerful in detecting miscalibration with respect to the reference set of parameters that they were trained with, the one for camera 2 performs significantly better.

This mismatch in performance is caused by the level of similarity between the sets of correct calibration parameters provided for each camera. The APPD ranges for the four test days, relative to the day used for training and validation are and for camera 2 and camera 3 respectively. It is likely that camera 3 might not have been well-calibrated either on the training day or on some of the other days. This result illustrates the importance of selecting a ‘correct’ calibration, with respect to which the training process must be defined.

As the two cameras are of the same make and brand, are positioned solely with a horizontal offset from one another, and operate in the same environment, the transferability of the model trained on the data from one camera to the other was also evaluated. The corresponding results are shown in Figure 4c. Indeed, the model trained on camera 2 generalizes well to camera 3. Figure 4c also shows that the reverse generalization does not hold, which stresses the importance of the choice of reference calibration and is another indication that camera 3 might have been slightly less consistently calibrated.

Figure 4 further demonstrates that the trained models experience bias in the extremely low and extremely high APPD values. This is a limitation of both training in a regression setting and of the miscalibration sampling procedure, which provides very few miscalibrations with APPD values close to 0. Instead, if one targets a specific performance metric, which can be related to APPD (see Section IV-D), then they can determine a threshold value and rephrase the problem into a binary classification setting.

While the presented neural network architecture is simple and further performance improvement may be possible, the above results indicate that a CNN can indeed be trained to be sensitive to miscalibration artifacts. One should note that the data does not explicitly designate the regularities which are not robust to the perturbation effects arising from disturbing a camera setup, but the model has discovered these regularities on its own. Moreover, even though motion distortion and blur are not explicitly addressed by the analysis, they are represented in both the training and test sets (as the images are obtained from a moving vehicle), and therefore the results account for them as well.

Iv-B Generalization to new Environments and Cameras

Fig. 6: APPD prediction accuracy when evaluating on environments and camera positions that were not represented in the training set. The training was performed on a subset of the nuScenes dataset [4] recorded with the front-center camera in Boston. The plots show the distribution of predictions for given quantized APPD values. The dashed line designates perfect prediction.

The KITTI dataset is limited in the variation of its scenes (recordings only in the city of Karlsruhe, Germany), and in the position of the camera sensors (both oriented forward with only a horizontal offset between them). In order to study the potential further generalization capabilities of the proposed method, we trained the same model on some of the scenes recorded in Boston from the forward camera of the nuScenes dataset [4]. Only the scenes recorded during the day were considered.

The performance of the model was evaluated on the other scenes of the same camera in Boston, as well as on the forward camera in Singapore, and the forward-left and forward-right cameras in Boston. The results can be seen in Figure 6 and show that the accuracy is comparable in the four cases. The sets of intrinsic calibration parameters for the four cameras considered are almost the same and hence much closer in terms of APPD than the ones from the KITTI dataset (less than ). This consistency between the different sensors, as well as the higher resolution of the images, explains why the model trained on nuScenes exhibits better performance.

Iv-C Relationship between APPD and Calibration Parameters

Some of the effects of the different intrinsic parameters on the APPD value are illustrated in Figure 5. The plots are obtained by individually varying one parameter while keeping the others fixed. The difference in effect when varying and compared to and , respectively, is due to the wide aspect ratio (2.72) of the image. This causes parameters along the -axis to have a stronger effect on the distortion of the images. Another point to observe from Figure 5 is the noise, which is more visible at lower APPD values. This noise is due to quantization effects causing numerical imprecision when computing the undistortion maps. The amount of noise can be reduced, at the cost of more computation time, by calculating the APPD at a higher image resolution, but is not necessary for any practical purposes.

Iv-D Relationship between APPD and Reprojection Error

Reprojection error is a standard measure of the deterioration of a robotic system’s performance in various vision-related tasks [15]. Therefore, it is of interest to relate the APPD metric of a misrectification to the reprojection error it causes. As mentioned in Section III-A, the physical scenario that we are interested in is when a camera experiences a hardware change without the corresponding change in intrinsic parameters. Data for such scenarios is difficult to obtain. Therefore we propose applying the reverse process: the physical sensor stays the same while the parameters are changed. We perform a few simple tests on simulated data to further analyze the relationship between APPD and reprojection error.

First, consider the case when the camera is kept unchanged, but the robot’s belief of its intrinsic calibration is changed. One can generate a set of points in front of a virtual camera and then project them into the camera plane using the correct intrinsic parameters of the physical camera. These points can then be rectified with both the correct and incorrect sets of parameters. Since point associations are known, the reprojection error can be calculated as the average distance between the resulting rectified projections in the image plane. Figure 7a shows the obtained relationship. Indeed, APPD is a good measure of the reprojection error that arises from rectifying with a wrong calibration parameter set.

Second, it is of interest to know how the real physical scenario would relate with the reverse synthetic scenario in order to evaluate if the method outlined here can be applied to a real system. This can be achieved by repeating the above-described point-projection and rectification procedure, but keeping the intrinsic parameters for the rectification step fixed while varying the set for the projection step. This setting corresponds exactly to the physical situation but cannot be reproduced synthetically on a real image (we cannot ‘reproject’ reality with a different set of intrinsic parameters). The comparison between the resulting reprojection error and APPD can be seen in Figure 7b. The result is that there is no-longer an injective functional relationship from APPD to reprojection error, and the dependence between the two values is less pronounced.

Fig. 7: Plot between APPD and reprojection error for (a) when projecting a set of points with a single set of calibration parameters and rectifying with a variety of parameters, and (b) when projecting a set of points with a variety of calibration parameters and rectifying with a fixed set.

APPD is easy to calculate for a real-world dataset as it is independent of the hardware that is used to obtain the image. Furthermore, it allows the sampling of an almost infinite number of different intrinsic calibration parameters, which is beneficial for training neural networks, which require large amounts of data. Nevertheless, it might not be the most accurate metric for detecting physical miscalibration. In fact, as Figure 8 shows, the reprojection error as computed for the physical scenario is better correlated with the SLAM performance of a system. As mentioned above, the drawback of using reprojection error as a miscalibration metric is that it cannot be calculated for a real-world dataset. No procedure similar to the one in Section III-A can be constructed for the physical miscalibration case and its corresponding reprojection error. Therefore, one would need to create a dataset with various camera settings and the respective calibration parameters for each one, which can be impractically time-consuming as it needs to be repeated for each camera individually. The variety of possible calibrations would also be severely limited by the design of the lens, as most lenses only have one degree of freedom. Alternatively, a fully synthetic dataset, e.g. generated in simulation, can be used, but then transferability to real image data would be questionable.

Fig. 8: Plot between the performance of ORB-SLAM [13], evaluated on the KITTI odometry sequence 10, and (a) reprojection error when projecting a set of points with a single set of parameters and rectifying with a variety of sets, and (b) the corresponding APPD.

The advantage of using APPD is that it facilitates training with very large sets of data that are easily obtained via the procedure detailed in  Section III-A. The type of artifacts introduced by the dataset generation in Section III-A can be considered similar to the ones introduced by a physical miscalibration. By demonstrating that APPD is learnable by a neural network, we show that it might be possible to also learn the reprojection error.

V Conclusion

We proposed a novel semi-synthetic data generation procedure that requires no data labeling and a corresponding camera miscalibration metric called the average pixel position difference (APPD). These tools can then be used to train a simple CNN, which we show is able to predict the APPD values from images with no additional data necessary. The performance of the network was evaluated on different real-world datasets and cameras. Provided the camera’s true intrinsic parameters remained close, the network was able to generalize well to different cameras and environments that it had not seen before. Such a network can then be deployed on a real robotic platform, running at a very low frequency, to determine if a more expensive recalibration procedure needs to be executed.


The authors would like to thank Jen Jen Chung, Lionel Ott, Juan Nieto and Davide Scaramuzza for their feedback and valuable insights.


  • [1] L. Agapito, E. Hayman, and I. Reid (2001) Self-calibration of rotating and zooming cameras. International Journal of Computer Vision 45 (2), pp. 107–127. Cited by: §II.
  • [2] M. Armstrong, A. Zisserman, and R. Hartley (1996) Self-calibration from image triplets. In European Conference on Computer Vision, pp. 1–16. Cited by: §II.
  • [3] B. Atcheson, F. Heide, and W. Heidrich (2010) Caltag: high precision fiducial markers for camera calibration. In 15th International Workshop on Vision, Modeling and Visualization, Vol. 10, pp. 41–48. Cited by: §II.
  • [4] H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom (2019) nuScenes: a multimodal dataset for autonomous driving. arXiv preprint arXiv:1903.11027. Cited by: Fig. 6, §IV-B.
  • [5] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun (2013) Vision meets robotics: the KITTI dataset. The International Journal of Robotics Research 32 (11), pp. 1231–1237. Cited by: Fig. 1, Fig. 2, Fig. 4.
  • [6] X. Glorot and Y. Bengio (2010) Understanding the difficulty of training deep feedforward neural networks. In Aistats, Vol. 9, pp. 249–256. Cited by: §III-C.
  • [7] I. Goodfellow, Y. Bengio, and A. Courville (2016) Deep learning. MIT Press. Note: http://www.deeplearningbook.org Cited by: §II.
  • [8] J. Heikkila (2000) Geometric camera calibration using circular control points. IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (10), pp. 1066–1077. Cited by: §III-A.
  • [9] M. Lopez, R. Mari, P. Gargallo, Y. Kuang, J. Gonzalez-Jimenez, and G. Haro (2019) Deep single image camera calibration with radial distortion. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    pp. 11817–11825. Cited by: §II.
  • [10] Y. Lu, E. G. Collins Jr, and M. F. Selekwa (2004) Parity relation based fault detection, isolation and reconfiguration for autonomous ground vehicle localization sensors. Technical report Department of Mechanical Engineering, FAMU-FSU College of Engineering. Cited by: §II.
  • [11] S. J. Maybank and O. D. Faugeras (1992) A theory of self-calibration of a moving camera. International Journal of Computer Vision 8 (2), pp. 123–151. Cited by: §II.
  • [12] J. P. Mendoza, M. Veloso, and R. Simmons (2012) Mobile robot fault detection based on redundant information statistics. In IROS Workshop on Safety in Human-Robot Coexistence and Interaction, Vilamoura, Portugal, Vol. 945. Cited by: §II.
  • [13] R. Mur-Artal and J. D. Tardós (2017)

    ORB-SLAM2: an open-source slam system for monocular, stereo, and RGB-D cameras

    IEEE Transactions on Robotics 33 (5), pp. 1255–1262. Cited by: Fig. 8.
  • [14] K. Ni, N. Ramanathan, M. N. H. Chehade, L. Balzano, S. Nair, S. Zahedi, E. Kohler, G. Pottie, M. Hansen, and M. Srivastava (2009) Sensor network data fault types. ACM Transactions on Sensor Networks 5 (3), pp. 25. Cited by: §I.
  • [15] L. Oth, P. Furgale, L. Kneip, and R. Siegwart (2013) Rolling shutter camera calibration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1360–1367. Cited by: §IV-D.
  • [16] K. D. P. and B. J. L. (2015) ADAM: A method for stochastic optimization. International Conference on Learning Representations, 1–13. Cited by: §III-C.
  • [17] J. A. Preiss, K. Hausman, G. S. Sukhatme, and S. Weiss (2018) Simultaneous self-calibration and navigation using trajectory optimization. The International Journal of Robotics Research 37 (13-14), pp. 1573–1594. Cited by: §I.
  • [18] Z. Roth, B. Mooring, and B. Ravani (1987-11) An overview of robot calibration. IEEE Journal of Robotics and Automation 3, pp. 377 – 385. External Links: Document Cited by: §I.
  • [19] S. I. Roumeliotis, G. S. Sukhatme, and G. A. Bekey (1998) Sensor fault detection and identification in a mobile robot. In Proceedings. 1998 IEEE/RSJ International Conference on Intelligent Robots and Systems. Innovations in Theory, Practice and Applications, Vol. 3, pp. 1383–1388. Cited by: §II.
  • [20] D. Scaramuzza, A. Martinelli, and R. Siegwart (2006) A toolbox for easily calibrating omnidirectional cameras. In 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5695–5701. Cited by: §II.
  • [21] T. Schneider, M. Li, M. Burri, J. I. Nieto, R. Siegwart, and I. Gilitschenski (2017) Visual-inertial self-calibration on informative motion segments. IEEE International Conference on Robotics and Automation, pp. 6487–6494. Cited by: §I.
  • [22] T. Schneider, M. Li, C. Cadena, J. Nieto, and R. Siegwart (2019) Observability-aware self-calibration of visual and inertial sensors for ego-motion estimation. IEEE Sensors Journal 19 (10), pp. 3846–3860. Cited by: §II.
  • [23] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov (2014) Dropout: a simple way to prevent neural networks from overfitting.

    Journal of Machine Learning Research

    15, pp. 1929–1958.
    Cited by: §III-C.
  • [24] P. Sturm (1997) Critical motion sequences for monocular self-calibration and uncalibrated euclidean reconstruction. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 1100–1105. Cited by: §II.
  • [25] P. Sundvall and P. Jensfelt (2006) Fault detection for mobile robots using redundant positioning systems. In Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006., pp. 3781–3786. Cited by: §II.
  • [26] M. L. Visinsky, J. R. Cavallaro, and I. D. Walker (1994) Robot fault detection and fault tolerance: A survey. Reliability Engineering and System Safety 46 (2), pp. 139–158. Cited by: §I.
  • [27] H. Wildenauer and A. Hanbury (2012) Robust camera self-calibration from monocular images of Manhattan worlds. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 2831–2838. Cited by: §II.
  • [28] K. Wilson and N. Snavely (2014) Robust global translations with 1dsfm. In European Conference on Computer Vision, pp. 61–75. Cited by: §II.
  • [29] S. Workman, C. Greenwell, M. Zhai, R. Baltenberger, and N. Jacobs (2015) Deepfocal: a method for direct focal length estimation. In IEEE International Conference on Image Processing, pp. 1369–1373. Cited by: §II.
  • [30] J. Xiao, K. A. Ehinger, A. Oliva, and A. Torralba (2012) Recognizing scene viewpoint using panoramic place representation. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 2695–2702. Cited by: §II.
  • [31] X. Yin, X. Wang, J. Yu, M. Zhang, P. Fua, and D. Tao (2018) FishEyeRecNet: a multi-context collaborative deep network for fisheye image rectification. In Proceedings of the European Conference on Computer Vision, pp. 469–484. Cited by: §II.
  • [32] Z. Zhang (2000-11) A flexible new technique for camera calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (11), pp. 1330–1334. External Links: Document, ISSN 0162-8828 Cited by: §II.