Camera Calibration with Pose Guidance

02/19/2021 ∙ by Yuzhuo Ren, et al. ∙ 0

Camera calibration plays a critical role in various computer vision tasks such as autonomous driving or augmented reality. Widely used camera calibration tools utilize plane pattern based methodology, such as using a chessboard or AprilTag board, user's calibration expertise level significantly affects calibration accuracy and consistency when without clear instruction. Furthermore, calibration is a recurring task that has to be performed each time the camera is changed or moved. It's also a great burden to calibrate huge amounts of cameras such as Driver Monitoring System (DMS) cameras in a production line with millions of vehicles. To resolve above issues, we propose a calibration system called Calibration with Pose Guidance to improve calibration accuracy, reduce calibration variance among different users or different trials of the same person. Experiment result shows that our proposed method achieves more accurate and consistent calibration than traditional calibration tools.



There are no comments yet.


page 1

page 3

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Camera calibration models and estimates a camera’s intrinsic and extrinsic parameters, and is an essential first step for many robotic and computer vision applications 

[9, 15, 20, 11]

. Intrinsic parameters deal with the camera’s internal characteristics, such as, its focal length, principle point, skew, and lens distortion 

[22]. Extrinsic parameters describe camera’s position and orientation [7, 10]. Knowing a camera’s calibration parameters allows us to remove its lens distortion, which is necessary in many applications that demands accuracy such as vehicle or pedestrian detection in wide Field of View autonomous vehicle cameras. However, reliable and accurate camera calibration usually requires an expert intuition to reliably constrain all of the parameters in the camera model. Existing calibration toolboxes [2, 1] ask users to capture images from a posed calibration pattern board (chessboard [22, 4, 8], circle grid pattern [5], AprilTag [21, 14], etc.) in positions of their choosing, after which the maximum-likelihood calibrations parameters are computed using all images in a batch optimization. Tan et al. [18] proposed to use monitor to display poses, however how to choose optimal poses to display on monitor screen is not considered. Richardson et al. [12] and Rojtberg et al[13] proposed pose selection for interactive calibration which depends on a good pose initialization. The existing calibrators have common issues: 1) Calibration result consistency is not guaranteed if a tool is ran by users with different level of expertise. Even for the same user, different runs of the same tool may result in significant difference. 2) The widely used re-projection error alone is not sufficient to control estimated parameters’ error. 3) User has to guess the chessboard pose and whether the pose number and variation can lead to a successful calibration, this is challenging especially for the novice without domain expertise. It can cause frustrated user experience and also make quality control hard [16].

Figure 1: Overview of proposed camera calibration with pose guidance system. In the first step, our approach generates a set of 3D virtual poses. The optimal 3D virtual set is selected among many candidate pose sets. Two pose sets are denoted using red and blue color are shown for illustration. The pose set maximizing a defined score function is selected as optimal pose set which is deployed in our calibration system in next step. In the second step, an expected virtual pose is displayed on top of video streaming to guide the user to move calibration pattern with adjustment instructions shown in red arrows.

In this work, we try to close these gaps by proposing the Calibration with Pose Guidance system, the diagram of which is shown in Fig. 1. In the first step, a set of optimal 3D virtual poses are selected using a novel score function which narrows down solution search space and avoid degenerated poses. In the second step, expected virtual poses are displayed on top of video streaming to guide the users to move calibration pattern respectively, with adjustment instructions shown in red arrows.

We summarize our major contributions as follows: 1) A method to automatically generate an optimal set of poses for calibrating a camera is proposed. The pose set automatically avoids degenerated cases, such as feeding the images captured at the same place many times into the tool and helps narrow down solution search space for calibration optimization. 2) A novel score function to evaluate pose sets to find the optimized set for a specific application scenario. 3) A gamified Human Computer Interface (HCI) that is simple and straightforward to guide any user, no matter of its expertise, to capture sufficient and desired pre-defined poses accurately and consistently, with visual hints for adjustment of the pattern board for each pose. Our method saves a lot of training time for novices to conduct calibration. The optimal pose set achieves higher accuracy and consistency than human involved calibration method especially for the novices.

2 Methodology

2.1 Calibration with Pose Guidance System Overview

While the same methodology applies to all kinds of camera models and pattern boards, we use the pinhole camera model and chessboard [22] to illustrate our proposed method. Denote an arbitrary 3D world point as and its projected 2D image point , their homogeneous representations can be denoted as and . Their geometric relationship can be represented as the following equation [22],


where are the rotation and translation which relate the world coordinate system to the camera coordinate system, is the intrinsic matrix, is the scalar factor, and is the distortion operator. We use to denote radial distortion, and to denote tangential distortion. For intrinsic matrix, we use to denote camera focal length, and to denote the principal point. Camera intrinsic calibration is to estimate and lens distortion . Camera extrinsic calibration is to estimate and . Intrinsic and extrinsic parameters can be estimated by optimization procedure  [22] to minimize the re-projection error in Eq.(2),


where is the projection of point () in image () according to Eq.(1). is the correspondent detected 2D point for point in image .

Our pose guidance contains two steps as shown in Fig.1, optimal 3D virtual pose set selection, and pose set deployment, the details of which are described in Section  2.2 and  2.3 respectively.

2.2 Optimal 3D Virtual Pose Set Selection

We define a pose as the chessboard’s posture in 3D space, which can be parameterized as . A pose set P is a set of N such poses, where N is experimental variable, e.g. N = 20 in our experiments, which can be represented in Eq.( 3):


In previous work, no constraints are set for how such a pose set shall be selected, and they can be randomly picked up. However, random pose set can have multiple issues. First, pose set contains degenerated case leads to singular solution in calibration optimization step. Second, the coverage of the poses may not be sufficient horizontally, vertically, or in terms of distance, rotation angle variance, which are critical in many applications, such as accurate distortion parameter estimation.

There are two steps in our proposed optimal 3D virtual pose set selection: 1) proposing candidate virtual pose sets; and 2) defining a score function and searching in the candidate sets that maximizes the score function. Finding the optimal solution for optimal pose set, denoted as , is difficult due to the infinite of the searching space. However, we can set reasonably constraints according to specific application and propose candidate sets, for example, removing all poses whose camera-board distance greater than 2 meters or whose yaw angle greater than a certain degree threshold for a DMS application. Then, we define a novel score function to rank pose set candidates and select the one with the highest score as the final result.

Generation of High Quality Pose Set. For each computer vision application where camera calibration is required, we can define a pose search space . We define a pose search space which is the camera working field of view space. Note that S can vary among different camera use cases. For example, in DMS we are interested to ensure objects within 1 or 2 meters in the camera’s field of view are imaged appropriately, but the interested range can reduce to 10 to 80 centimeters if we are calibrating a smartphone front camera. Assume poses are uniformly sampled in . We randomly select poses to avoid degenerate poses, such as repetition of the same pose, or missing coverage of a corner. For example, two parallel poses only have different distance to camera lead ambiguity to focal length estimation. Calibration degenerated cases will result in local minimal solution and have been studied by many research  [16, 6, 17, 3, 19].

Figure 2: Example of two pose sets evaluated using MRE and our proposed score function. The pose set covers the whole camera field of view evenly and consist of various pose variations gives higher score, and the bad pose set with many duplicate poses ranks low in our score. Both of their MRE are considered as good calibration if using industrial practical standard (i.e. MRE less than 1 or 2 pixels).

Pose Set Ranking. We adapt an iterative procedure to rank the pose set candidates [12]: several poses are selected to estimate camera model, iterative the procedure to cover poses on not very well calibrated region and an updated camera model is estimated. We chose 15-20 (an empirical number from our experience and also suggested by many references[2]) selected poses cover whole camera field of view to estimate camera intrinsic parameters, denoted as . Note that the initialization step can be skipped if a good estimation of intrinsic parameters are already known, for example, camera factory calibration is known or the same model of camera has already been calibrated. Major previous work [22] only use Mean Reprojection Error (MRE) to evaluate the quality of a calibration result. However, MRE alone is not sufficient for measuring all calibration intrinsic parameters’ accuracy; MRE can still be very small for large intrinsic parameter errors, as shown in Fig. 2. Our proposed score function takes both MRE and estimated parameter variance into consideration. Camera parameter is estimated from each pose set candidate, then we compare the camera parameter estimated from each pose set candidate with initialized camera model parameter . The score function to evaluate each pose set candidate’s quality is reciprocal of summation of MRE and parameter estimation variance, as shown in Eq.(4),


where is the estimated intrinsic parameters, and are parameters to control the cost from re-projection error and parameter estimation error.

The optimized pose set , which obtains the highest score among all the candidate sets is therefore defined as:


The score function is in favor of the pose set that gives both minimum re-projection and parameter estimation variance. The pose set with the highest score is chosen to be the final optimal pose set.

Figure 3: Calibration with pose guidance system user interface.

2.3 Optimal 3D Virtual Pose Set Deployment

We discuss how to use optimal pose set to project onto calibration interface to guide a user to pose calibration pattern board appropriately in this section. One example of how a user is guided to move a chessboard, match the expected pose, and capture a qualified image is shown in Fig.3. To capture a qualified image, the user needs to move the calibration pattern around to ensure its image on screen matches the guided pose displayed. If the average distance of the four out-most corners is less than a threshold, where distance is defined as the pixel distance between the expected position and current position, the user’ pose is considered matched with guided pose. Once the user matches the guided pose, the system will capture current frame and show the next guided pose. The procedure repeats until all images are captured. Our solution supports both automatically capturing, or manually capturing such as by pressing a specific key in keyboard.

3 Experiments

We evaluate our proposed camera calibration system from multiple perspectives. First, we evaluate our proposed score function to demonstrate its capability to select optimal pose set which improve calibration accuracy. Second, we report result to demonstrate the robustness of calibration accuracy and reproducibility of our method. Finally, we demonstrate that our calibration tool is applicable to a wide variety of lens111Len1:HFOV=80, VFOV=60, resolution=1280x800, fomat=IR 222Len2:HFOV=120, VFOV=100, resolution=1920x1208, fomat=RGB.

3.1 Score Function Evaluation

To evaluate the effectiveness of our optimal pose selection using defined score function in Eq.(5), we simulate a virtual camera with known intrinsic parameters and we use a chessboard as calibration pattern. Pose selection space is chosen based on the working space range from specific use cases and candidate pose sets are generated in . Eq.(1) is used to project pose sets onto 2D image and Eq.(4) is used to compute score for each pose set.

We show pose set with maximum and minimum Mean Re-projection Error (MRE), our proposed score and calibration parameter estimation error in Table 1. Comparing first row and fourth row, pose set with smaller MRE may not have smaller parameter estimation error. In contrast to MRE, our score in favor pose set with both smaller MRE and smaller parameter variance which gives a much superior calibration accuracy evaluation.

MRE score in Eq.(4)
PS(min MRE) 0.0970 1.3689 0.6336
PS(max MRE) 0.1679 1.0820 0.7562
PS(min score) 0.1440 0.2052 4.729
PS(max score) 0.1186 4.8216 0.088
Table 1: Comparison of pose sets (PS) with maximum and minimum MRE, and with maximum and minimum score.
Figure 4: Calibration accuracy and variance comparison between OpenCV and our calibration system. Mean re-projection error from 10 users’ trials are indicated by red horizontal bars, the full range by blue boxes.
OpenCV Ours

Len1 Mean Std Mean Std
1357.8 23.3 1350.1 2.6
1356.9 19.5 1352.1 2.8
660.6 14.2 657.6 2.7
411.4 27.7 383.8 4.9
-0.3829 0.0105 -0.3774 0.0047
0.2436 0.2715 0.2425 0.0472
Len2 Mean Std Mean Std
976.7 13.8 972.5 2.9
977.6 13.4 974.4 2.9
963.1 9.5 954.9 4.2
633.8 8.0 644.4 1.0
-0.3591 0.0188 -0.3454 0.0059
0.1777 0.0375 0.1398 0.0092
Table 2:

Mean and standard deviation of focal lengths, focal centers and distortion parameters (only

and are listed here for illustration) estimation for all trials in the human study. While the mean values of parameter estimation are similar from OpenCV and our method, our method provides much less standard deviation.

3.2 Calibration Accuracy and Reproducibility

We invited 10 participants without any previous calibration experience to calibrate various camera lens with two different methods: 1) OpenCV calibration toolbox [2] 2) Calibration with Pose Guidance. Participants were given a printed instructions for calibration which describe change poses of chessboard when using OpenCV calibration. We show the participant sample poses from OpenCV calibration tutorial website. We received feedback like how much degree the chessboard pose should be changed, how far should be the distance between the chessboard and the camera, etc, which shows that participants in general need additional instructions to use OpenCV calibration tool.

Fig.4 shows re-projection error using different calibration tools. Our proposed calibration system achieves much smaller re-projection error and smaller variance. Table 2 shows detailed statistics, where the mean and standard deviation of estimated intrinsic parameter from 10 participants are listed. Our system provides much smaller standard deviations among all parameters estimation. In summary, our system provides: 1) smaller re-projection error which indicates higher calibration accuracy and 2) smaller parameter estimation variance among different trials which demonstrates stability and reproducibility of our calibration method.

4 Conclusions

Camera calibration with pose guidance is proposed to improve calibration accuracy, reduce calibration variance and reduce training time to novices. We propose a novel score function to select optimal pose set which reduces both re-projection error and intrinsic parameter estimation variance. Our proposed calibration system is evaluated against widely used calibration tools. Multiple experiments are conducted to demonstrate the accuracy and robustness of our system.


  • [1] J. Bouguet (2004) Camera calibration toolbox for matlab. http://www. vision. caltech. edu/bouguetj/calib_doc/index. html. Cited by: §1.
  • [2] G. Bradski and A. Kaehler (2000) OpenCV. Dr. Dobb?s journal of software tools 3. Cited by: §1, §2.2, §3.2.
  • [3] T. Buchanan (1988) The twisted cubic and camera calibration. Computer Vision, Graphics, and Image Processing 42 (1), pp. 130–132. Cited by: §2.2.
  • [4] Y. Chen, F. Huang, F. Shi, B. Liu, and H. Yu (2019) Plane chessboard-based calibration method for a lwir ultra-wide-angle camera. Applied optics 58 (4), pp. 744–751. Cited by: §1.
  • [5] H. Ha, M. Perdoch, H. Alismail, I. So Kweon, and Y. Sheikh (2017) Deltille grids for geometric camera calibration. In Proceedings of the IEEE International Conference on Computer Vision, pp. 5344–5352. Cited by: §1.
  • [6] P. Hammarstedt, P. Sturm, and A. Heyden (2005) Degenerate cases and closed-form solutions for camera calibration with one-dimensional objects. In Tenth IEEE International Conference on Computer Vision (ICCV’05) Volume 1, Vol. 1, pp. 317–324. Cited by: §2.2.
  • [7] L. Huang, F. Da, and S. Gai (2019) Research on multi-camera calibration and point cloud correction method based on three-dimensional calibration object. Optics and Lasers in Engineering 115, pp. 32–41. Cited by: §1.
  • [8] T. Li, F. Hu, and Z. Geng (2011) Geometric calibration of a camera-projector 3d imaging system. In Proceedings of the 10th International Conference on Virtual Reality Continuum and Its Applications in Industry, pp. 187–194. Cited by: §1.
  • [9] H. Liu, H. Lin, and L. Yao (2017) Calibration method for projector-camera-based telecentric fringe projection profilometry system. Optics Express 25 (25), pp. 31492–31508. Cited by: §1.
  • [10] L. R. Ramírez-Hernández, J. C. Rodríguez-Quiñonez, M. J. Castro-Toscano, D. Hernández-Balbuena, W. Flores-Fuentes, R. Rascón-Carmona, L. Lindner, and O. Sergiyenko (2020) Improve three-dimensional point localization accuracy in stereo vision systems using a novel camera calibration method. International Journal of Advanced Robotic Systems 17 (1), pp. 1729881419896717. Cited by: §1.
  • [11] Y. Ren (2013) Techniques for vanishing point detection. University of Southern California. Cited by: §1.
  • [12] A. Richardson, J. Strom, and E. Olson (2013) AprilCal: assisted and repeatable camera calibration. In 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1814–1821. Cited by: §1, §2.2.
  • [13] P. Rojtberg and A. Kuijper (2018) Efficient pose selection for interactive camera calibration. In 2018 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 31–36. Cited by: §1.
  • [14] A. Sagitov, K. Shabalina, L. Sabirova, H. Li, and E. Magid (2017) ARTag, apriltag and caltag fiducial marker systems: comparison in a presence of partial marker occlusion and rotation.. In ICINCO (2), pp. 182–191. Cited by: §1.
  • [15] J. Sochor, R. Juránek, and A. Herout (2017) Traffic surveillance camera calibration by 3d model bounding box alignment for accurate vehicle speed measurement. Computer Vision and Image Understanding 161, pp. 87–98. Cited by: §1.
  • [16] P. F. Sturm and S. J. Maybank (1999) On plane-based camera calibration: a general algorithm, singularities, applications. In

    Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149)

    Vol. 1, pp. 432–437. Cited by: §1, §2.2.
  • [17] P. Sturm (2000) A case against kruppa’s equations for camera self-calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (10), pp. 1199–1204. Cited by: §2.2.
  • [18] L. Tan, Y. Wang, H. Yu, and J. Zhu (2017) Automatic camera calibration using active displays of a virtual pattern. Sensors 17 (4), pp. 685. Cited by: §1.
  • [19] B. Triggs (1998) Autocalibration from planar scenes. In European conference on computer vision, pp. 89–105. Cited by: §2.2.
  • [20] P. Wang, J. Wang, J. Xu, Y. Guan, G. Zhang, and K. Chen (2017) Calibration method for a large-scale structured light measurement system. Applied Optics 56 (14), pp. 3995–4002. Cited by: §1.
  • [21] Y. Xie, R. Shao, P. Guli, B. Li, and L. Wang (2018) Infrastructure based calibration of a multi-camera and multi-lidar system using apriltags. In 2018 IEEE Intelligent Vehicles Symposium (IV), pp. 605–610. Cited by: §1.
  • [22] Z. Zhang (2000) A flexible new technique for camera calibration. IEEE Transactions on pattern analysis and machine intelligence 22 (11), pp. 1330–1334. Cited by: §1, §2.1, §2.1, §2.2.