Many 3D computer vision applications require knowledge of the camera parameters to relate the 3D world to the acquired 2D image(s). The process of estimating the camera parameters is calledcamera calibration in which two groups of parameters (intrinsic and extrinsic) are estimated.
In order to calibrate a camera, conventional calibration methods need to acquire some information from the real 3D world using calibration objects such as grids, wands, or LEDs. This imposes a major limitation on the calibration task since the camera can be calibrated only in off-line and controlled environments. To address this issue, Maybank and Faugeras [9, 7] proposed the so-called self-calibration approach in which they used the information of matched points in several images taken by the same camera from different views instead of using known 3D points (calibration objects). In their two-step method, they first estimated the epipolar transformation from three pairs of views, and then linked it to the image of an absolute conic using the Kruppa equations . Not long after the seminal work of Maybank and Faugeras, Basu proposed the idea of Active Calibration [1, 2] in which he included the concept of active camera motions and eliminated point-to-point correspondences.
The main downside of the Active Calibration strategies (A and B) in [2, 1, 3] is that it calculates the camera intrinsics using a component of the projection equation in which a constraint is imposed by the degenerate rotations. For example, after panning the camera, the equation derived from vertical variations observed in the new image plane is unstable. Furthermore, the small angle approximation using and decreases the accuracy of strategies when the angle of rotation is not very small. Also, rolling the camera  is impractical (without having a precise mechanical device) because it creates translational offsets in the camera center. In this paper, we propose a Simplified Active Calibration (SAC) formulation in which the equations are closed-form and linear. To overcome the instability caused by using degenerate rotations in Active Calibration, we calculate focal length in each direction separately. In addition, we do not use small angle approximation by replacing and . Hence, in our formulation we only refer to the elements of the rotation matrix. Moreover, the proposed method is more practical because it does not require a roll rotation of the camera; only pan and tilt rotations, which can be easily acquired using PTZ cameras, are sufficient.
2 Simplified Active Calibration
Simplified Active Calibration (SAC) has been inspired by the novel idea of approximating the camera intrinsics using small angle rotations of the camera which was initially proposed in [1, 2] and extended in [3, 4]. Imposing three constraints on the translation of the camera generates a pure rotation motion. In addition, using small angle rotations allows us to ignore some non-linear terms in order to estimate the remaining linear parameters. The estimated intrinsics can then be used as an initial guess in the non-linear refinement processes.
In general, SAC can be used in any platform in which information about the camera motion is provided by the hardware, such as in robotic applications where the rotation of the camera can be extracted from the inertial sensors or in the surveillance control softwares that are able to rotate the PTZ cameras by specific angles. Having access to the rotation of the camera, we propose a 2-step process to estimate the focal length of the camera. In the first step, we present a closed-form solution to calculate an approximation of the focal length in the direction () using an image taken after a pan rotation of the camera, assuming that and represent the two major axes of the image plane. In the second step, we estimate the focal length of the camera in the direction () using an image taken after tilt rotation of the camera. Therefore, to estimate the two main components (focal length) of the intrinsic matrix, namely , two pairs of images are required, one taken before and after a small pan rotation, and another taken before and after a small tilt rotation.
2.1 Focal Length in the v Direction
We assume that the camera is located at the origin of the Cartesian coordinate system and is looking at distance where the principal point is specified. Every 3D point in the world that is visible to the camera can be projected onto a specific point of the image plane where the coordinates of the principal points are denoted by
. With modern cameras it is reasonable to assume that image pixels are square so that the value of the camera skew is zero.
Every point in an image seen by a stationary camera (that freely rotates but stays in a fixed location) is transformed to a point in another image taken after camera rotation. The mathematical relationship between and when the camera is panned is denoted by and after expanding the equation, the relationship is thus represented by:
Where is an element of the rotation matrix around -axis at row and column . After simplification of Eq.2:
Note that after a pure pan rotation, the coordinates of the new image will not be affected by the transformation. (The reader is referred to  for a detailed explanation and analysis about this fact.) In other words, image pixels only move horizontally. Thus, the rate of change in the direction before and after the pan rotation is close to one, viz:
The above substitution changes the value of the denominator to 1 and hence simplifies the whole projection equation.
Knowing that the principal point is close to the center of the image , where and represent the image height and width respectively, we replace with in Eq.6. Thus, we can derive a suitable linear equation to estimate the focal length in the direction from an image taken after a pan rotation.
Eq.7 needs only one point in the reference image that corresponds to in the transformed image. If there are more point correspondences, we can easily use the average of these points to obtain more robust results.
2.2 Focal Length in the u Direction
So far, we could estimate by the information provided from an image taken after a pan rotation. We repeat the same procedure to approximate . This time, we need an image taken after a pure tilt rotation of the camera. Thus, the projection equation is characterized by replacing with the proper rotation matrix that describes rotation of the camera around -axis. Following the same reasoning as in Section 2.1, a closed-form solution to estimate the focal length of the camera in the direction is obtained by:
3 Results and Analysis
Based on our proposed method, focal length in the and directions can be estimated using Eq.7 and Eq.8, respectively. Only one point correspondence is required to calculate the focal length. Fig.1 shows the estimated focal lengths using various pan and tilt angles on a 3D synthetic scene of a teapot taken by a simulated camera. It can be seen that when the pan and tilt angles are small, the estimated focal lengths are very close to the ground truth.
In another experiment, we calculate the proposed simplified active calibration formulation on 1000 different runs of 500 randomly generated 3D points for small pan and tilt angles. The mean and standard deviation of the results obtained are shown in Table1. As we can see, our proposed active calibration formulation attains results very close to the ground truth. Specifically, the error in focal length estimates is less than 1 pixels.
3.1 Angular Uncertainty
Acquiring the rotation angles requires either specific devices such as gyroscopes or a specially designed camera called a PTZ camera. Even using these devices does not guarantee that the extracted rotation angles are noise-free. To simulate the noisy conditions of a real-world application, we contaminated the angles of the above-mentioned teapot sequences with increasing angular errors.
While the point correspondences are kept fixed for all of the pan and tilt rotations, we calculate the focal length (Eq.7 and Eq.8) using contaminated pan and tilt angles. The results are shown in Fig.2. Specifically, Fig.2(a) and Fig.2(b) show the error of our proposed formula for estimating the focal length when the pan and tilt angles are not accurate. Every sequence has been colored based on its rotation angle, ranging from blue indicating smaller angles to red for larger angles. For focal length estimation, Fig.2(a) and Fig.2(b) illustrate that the sequences taken with smaller angles have steeper slopes than the sequences acquired with larger rotation angles. This shows that focal lengths are more sensitive to angular noise when the camera is rotated by smaller angles rather than larger angles.
Overall, when the camera is rotated by small angles, the influence of the angular noise on SAC equations is higher. On the other hand, SAC tends to use the benefit of rotating the camera by small angles. Therefore, to avoid magnifying the effect of noise it is important not to rotate the camera by very small angles.
3.2 Point Correspondence Noise
Another type of noise that affects the SAC equations is the noise in the location of features used for matching. To simulate such conditions, we assume that the location of every teapot point is disturbed by a Gaussian noise with zero mean and variance. Then, we calibrate the camera using SAC for all in the range of to . The intrinsic parameters obtained are illustrated in Fig.3.
Fig.3(a) and Fig.3(b) illustrate the influence of pixel noise on the estimation of focal length (Eq.7 and Eq.8). Colors are distributed based on the rotation angles of the camera and, hence, the distribution of the colors reveals how noise affects the SAC equations. In fact, the high concentration of red, yellow, and orange points around the zero error line in Fig.3(a) to (b) reveals that when the angle of the camera rotation is not very small, SAC achieves low-error estimates for focal lengths. This corroborates the claim that very small camera rotations can cause results from the SAC formulations to have high error.
3.3 Real Images
We studied the proposed SAC formulations on real images as well. We used the Canon VC-C50i PTZ camera that is able to freely rotate around -axis (pan) and -axis (tilt). The camera can be controlled by a host computer using a standard RS-232 serial communication. Therefore, the required pan and tilt rotation angles can be set in a specific packet and then be written into the camera serial buffer to rotate the camera based on the assigned rotation angles.
Using the above-mentioned procedure, we captured four sequences of images for evaluating the proposed SAC formulations. Fig.4 shows a sequence of our bookshelf scene. All sequences were taken using a fixed zoom. While keeping the zoom of the camera unchanged, another 30 images were acquired from various viewpoints of a checkerboard pattern. The ground truth of intrinsic parameters were found by applying the method of Zhang  on the checkerboard images.
The performance of SAC formulations on the four sequences of real images is reported in Table 2. For every sequence, we only used the images in the sequence. For example, to calculate the focal length in the direction of Sequence 1, we found the point correspondence between a reference image and the image taken after the pan rotation of the camera (Fig.4(a)). Then, we used only one of the matched points that is closer to the center of the image. Although we did not include the lens distortion parameter into the SAC formulation (because it creates non-linear equations), we decrease the inaccuracy of the focal length estimates by using a matched point that is closer to the center of the image and, thus, is less affected by the lens distortion. A similar procedure was adopted with the image taken after a tilt rotation of the camera (Fig.4(b)) for calculating the focal length in the direction of Sequence 1.
The errors reported by applying SAC on four different sequences of real images in Table 2 show that despite the presence of various types of noise, such as angular uncertainties, point correspondence noise and lens distortion, focal lengths estimated by SAC are close to the results of the method of Zhang , except when the angles of rotations are very small ().
Inspired by the idea of calibrating a camera through active movements of the camera, in this paper we presented a Simplified Active Calibration formulation. Our study provides closed-form and linear equations to estimate the parameters of the camera using two image pairs taken before and after panning and tilting the camera.
A basic assumption about the rotation of a fixed camera was made; i.e., to solve the proposed equations, knowing the rotation angles of the camera is necessary. The proposed formulation can be used in practical applications such as surveillance, because in PTZ and mobile phone cameras accessing the camera motion information is straightforward.
The proposed closed-form formulations for estimating the focal lengths can be solved with only one point correspondence. Finding the correspondence point is straightforward. Due to the recent developments in feature extractors, one may use [6, 5] to extract repeatable regions from a pair of images. This is especially useful for applications that prefer no point correspondences; where instead of the reference and transfered points in Eq.8 and Eq.7, the average of the edge points or the centroid of the regions can be used.
The results of solving our proposed formulations on randomly simulated 3D scenes indicated a very low error rate in estimating the focal lengths. We evaluated our proposed SAC formulation for two different noise conditions, namely angular and pixel noise. The simulated results showed that if the angle of rotation is not very small, the SAC formulation can robustly estimate the focal lengths. This conclusion was later verified in our experiment with real images. Our future work will focus on deriving linear equations for calculating the location of the principal point and also including lens distortion parameters into the Simplified Active Calibration equations.
-  A. Basu. Active calibration. In Robotics and Automation, 1993. Proceedings., 1993 IEEE International Conference on, pages 764–769. IEEE, 1993.
Active calibration: Alternative strategy and analysis.
Computer Vision and Pattern Recognition, 1993. Proceedings CVPR’93., 1993 IEEE Computer Society Conference on, pages 495–500. IEEE, 1993.
-  A. Basu. Active calibration of cameras: theory and implementation. IEEE Transactions on Systems, man, and cybernetics, 25(2):256–265, 1995.
-  A. Basu and K. Ravi. Active camera calibration using pan, tilt and roll. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 27(3):559–566, 1997.
-  M. Faraji, J. Shanbehzadeh, K. Nasrollahi, and T. Moeslund. Extremal regions detection guided by maxima of gradient magnitude. Image Processing, IEEE Transactions on, 2015.
-  M. Faraji, J. Shanbehzadeh, K. Nasrollahi, and T. B. Moeslund. Erel: Extremal regions of extremum levels. In Image Processing (ICIP), 2015 IEEE International Conference on, pages 681–685. IEEE, 2015.
-  O. D. Faugeras, Q.-T. Luong, and S. J. Maybank. Camera self-calibration: Theory and experiments. In European conference on computer vision, pages 321–334. Springer, 1992.
-  I. N. Junejo and H. Foroosh. Optimizing ptz camera calibration from two images. Machine Vision and Applications, 23(2):375–389, 2012.
-  S. J. Maybank and O. D. Faugeras. A theory of self-calibration of a moving camera. International Journal of Computer Vision, 8(2):123–151, 1992.
-  Z. Zhang. Flexible camera calibration by viewing a plane from unknown orientations. In Computer Vision, 1999. The Proceedings of the Seventh IEEE International Conference on, volume 1, pages 666–673. Ieee, 1999.