 # Simplified Active Calibration

We present a new mathematical formulation to estimate the intrinsic parameters of a camera in active or robotic platforms. We show that the focal lengths can be estimated using only one point correspondence that relates images taken before and after a degenerate rotation of the camera. The estimated focal lengths are then treated as known parameters to obtain a linear set of equations to calculate the principal point. Assuming that the principal point is close to the image center, the accuracy of the linear equations are increased by integrating the image center into the formulation. We extensively evaluate the formulations on a simulated camera, 3D scenes and real-world images. Our error analysis over simulated and real images indicates that the proposed Simplified Active Calibration method estimates the parameters of a camera with low error rates that can be used as an initial guess for further non-linear refinement procedures. Simplified Active Calibration can be employed in real-time environments for automatic calibrations given the proposed closed-form solutions.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Camera calibration is an essential step in many 3D computer vision applications where we need to calculate how the 3D world is projected onto a 2D image. Camera calibration aims not only to estimate the intrinsic camera parameters such as focal length, center of projection, pixel skew and aspect ratio, but also the camera motions, i.e., the rotation and translation of the camera.

In order to calibrate a camera, conventional calibration methods need to acquire some information from the real 3D world using calibration objects such as grids, wands, LEDs, or even by adding augmented reality markers to a camera zhao2018marker . This imposes a major limitation on the calibration task since the camera can be calibrated only in off-line and controlled environments. To address this issue, Maybank and Faugeras maybank1992theory ; faugeras1992camera proposed the so-called self-calibration approach in which they used the information of matched points in several images taken by the same camera from different views instead of using known 3D points (calibration objects). In their two-step method, they first estimated the epipolar transformation from three pairs of views, and then linked it to the image of an absolute conic using the Kruppa equations maybank1992theory . Not long after the seminal work of Maybank and Faugeras, Basu proposed the idea of Active Calibration basu1993active2 ; basu1993active in which he included rotations of a camera and eliminated point-to-point correspondences.

An active environment, can change the characteristics of a problem. For instance, an ill-posed and nonlinear problem for a passive observer can become well-defined and linear for an active observer aloimonos1988active . Thus, to successfully calibrate the camera, Active Calibration needs to control the camera motion. This makes it a perfect choice in on-line platforms like robotics or surveillance where the internal parameters might change due to focusing, zooming, or mechanical and thermal variations of the environment surrounding a camera. Therefore, knowing the motion of the camera is essential in Active Calibration and as Hartley stated hartley1997self , “simplifies the calibration task enormously.” Another advantage of Active Calibration is its closed-form strategies that calculate the intrinsic parameters through only two pairs of images taken after panning and tilting the camera.

Other works du1993self ; dron1993dynamic ; stein1995accurate that used known camera motions have also been published almost at the same time. Since the method of Maybank and Faugeras needed high accuracy in the computations hartley1994self ; hartley1997self , complicated rectification processes, and also because of the unavailability of the epipolar structure in the scenes taken from a fixed point, Hartley hartley1994self proposed a method for self-calibrating a camera with constant intrinsics using projective distortions of several pure camera rotations. Inspired by Hartley’s work, Agapito et al. proposed a self-calibration method for cameras that freely rotate while changing their internal parameters by zooming agapito2001self . The notion of varying intrinsics has also been considered in pollefeys1999self but no assumption about the camera motion has been made. Research in this area has expanded since the emergence of cell phone cameras capable of measuring the camera motions with Gyroscope and Inertial Measurement Unit (IMU) and Pan-Tilt-Zoom (PTZ) cameras. Given a close approximation of the camera motion, several papers proposed new formulations for calibrating non-rotating stationary cameras elamsy2014self , and cameras with known motion frahm2003camera ; frahm2003cameraCal ; frahm2003robust . Some studies calibrate the camera by having specific type of control over camera rotations knight2003linear ; hua2000new ; junejo2008practical ; wan2010self . More recent methods proposed self-calibration formulations that include camera lens distortions wu2013keeping ; galego2012auto ; sun2016camera . Also, some researchers expanded the self-calibration formulation to robotic camera networks heng2015self or used a camera rotation observed by another camera as the pattern to calibrate the observing camera bruckner2014intrinsic . Human motion has also been considered as a way to deduce the camera parameters tresadern2008camera . Figure 1: 3D scene and the simulated camera. a) A teapot in the 3D scene and its projected image on the simulated camera. b) The projected image of the teapot on the camera before (blue teapot) and after (red teapot) tilting the camera by 2.5∘. c) The projected image of the teapot on the camera before (blue teapot) and after (red teapot) panning the camera by 2.5∘. d) The projected image of the teapot on the camera before (blue teapot) and after (red teapot) panning the camera by 2.5∘ and then tilting the camera by 2.5∘.

The main downside of the Active Calibration strategies (A and B) in basu1993active ; basu1993active2 ; basu1995active is that it calculates the camera intrinsics using a component of the projection equation in which a constraint is imposed by the degenerate rotations. For example, after panning the camera the equation derived from vertical variations observed in the new image plane is unstable. Furthermore, the small angle approximation using and decreases the accuracy of the strategies when the angle of rotation is not very small. Also, rolling the camera basu1997active is impractical (without having a precise mechanical device) because it creates translational offsets in the camera center. In this paper, we propose a Simplified Active Calibration (SAC) formulation in which the equations are closed-form and linear. To overcome the instability caused by using degenerate rotations in Active Calibration, we calculate focal length in each direction separately faraji2018simplified . Then, through a mathematical derivation we remove the corresponding degenerate component from the equation. In addition, we do not use small angle approximation by replacing and . Hence, in our formulation we only refer to the elements of the rotation matrix. Moreover, the proposed method is more practical because it does not require a roll rotation of the camera; only pan and tilt rotations, which can be easily acquired using PTZ cameras, are sufficient.

The rest of the paper is organized as follows. In Section 2 we present our proposed Simplified Active Calibration formulation. Section 3 reports and analyzes the results of the proposed method on simulated and real scenes. Finally, our conclusions are drawn in Section 4.

## 2 Simplified Active Calibration

Simplified Active Calibration (SAC) has been inspired by the novel idea of approximating the camera intrinsics using small rotations of the camera which was initially proposed in basu1993active2 ; basu1993active and extended in basu1995active ; basu1997active . Imposing three constraints on the translation of the camera generates a pure rotation motion. In addition, using small rotation angles provides a condition suitable for ignoring some non-linear terms in order to estimate the remaining linear parameters. The estimated intrinsics can then be used as an initial guess in the non-linear refinement process.

Generally, SAC can be used in any platform in which information about the camera motion is provided by the hardware, such as in robotic applications where the rotation of the camera can be extracted from the inertial sensors or in surveillance control softwares that are able to rotate the PTZ cameras by specific angles. Having access to the rotation of the camera, we propose a 3-step process to calibrate the camera. In the first step, we present a closed-form solution to calculate an approximation of the focal length in the direction () using an image taken after a pan rotation of the camera, assuming that and represent the two major axes of the image plane. In the second step, we estimate the focal length of the camera in the direction () using an image taken after a tilt rotation of the camera. The third step consists of forming a system of linear equations to estimate the location of the principal point () in the image. Now, we have estimates for the four main components of the intrinsic matrix, namely and . Thus, we require three pairs of images, one taken before and after a small pan rotation, one taken before and after a small tilt rotation, and one taken before and after a small pan-tilt rotation.

### 2.1 Rotation Formulation

Throughout the rest of the paper, we formulate the rotation of the camera by the Euler angles in which every angle in the 3D coordinate system represents the amount of rotation about one of the coordinate axes and is denoted by a separate matrix. The final rotation matrix is thus computed using where , , and denote the rotations about , , and respectively. This formulation implies that the resulting matrix

has three degrees of freedom. Also, the elements of the final rotation matrix are represented as:

 R=[rij]3×3 (1)

Where denotes the row-wise element indices and represents the column-wise element indices.

In SAC, it is crucial to know the correct direction of the rotation matrix and its handedness since it has to correspond to the acquired images. For example, “which rotation matrix corresponds to an image acquired after panning the camera to the left.” Due to the importance of this issue towards having an elegant formulation and obtaining realistic results, we briefly explain every rotation matrix and its direction which is used throughout the paper.

#### 2.1.1 Roll

Roll is a rotation about the -axis, used only in Strategies C and D of the original Active Calibration basu1997active . However, SAC does not need a rolled image, because rolling the camera and keeping the principal point fixed at the same time is impractical with current cameras. (Imposing a constraint on or while rotating about is very difficult and creates a translational offset ji2004self ). Therefore, a identity matrix is used to calculate the final rotation matrix.

#### 2.1.2 Pan

Pan rotation of the camera represents a rotation about the -axis and is computed using the following equation.

 Ry=⎡⎢⎣cos(θp)0−sin(θp)010sin(θp)0cos(θp)⎤⎥⎦ (2)

The direction implied by this rotation matrix in our predefined camera model is a clockwise orientation if one uses the right-hand rule. Therefore, rotating the camera to the right indicates a positive angle value. On the other hand, for a rotation to the left side the angle should have a negative sign.

#### 2.1.3 Tilt

By tilt rotation we mean a rotation of the camera about the -axis which can be achieved by the following matrix.

 Rx=⎡⎢⎣1000cos(θt)sin(θt)0−sin(θt)cos(θt)⎤⎥⎦ (3)

Unlike the pan rotation, tilt orientation is counter-clockwise considering the right-hand rule. So, if the camera rotates upward the angle is positive and if it rotates downward the angle of the rotation is negative.

### 2.2 Camera Model

We assume that the camera is located at the origin of the Cartesian coordinate system and is looking at distance where the principal point is specified. It should be noted that represents the focal length of the camera. Furthermore, the principal axis coincides with the -axis, and the image plane is perpendicular to the principal axis. A point on the normalized camera coordinates is denoted by . Also, the column () and row () coordinate axes of the reference image plane are parallel to the -axis and the -axis of the camera, respectively. The relation between points in the normalized camera coordinates and the image points is as follows:

 v=mvx+v0 (4) u=−muy+u0 (5)

Where and represent the width and height of the pixels, respectively and are the location coordinates of the principal point in the image.

Every 3D point in the world that is visible to the camera can be projected onto a specific point of the image plane and can be calculated using the camera intrinsic matrix.

 K=⎡⎢⎣fvsv00−fuu0001⎤⎥⎦ (6)

Where is the focal length of the camera in the direction (in pixels), represents the focal length of the camera in the direction. With modern cameras it is reasonable to assume that image pixels are square and so the value of the camera skew () is zero.

Also, any camera transformation is equivalent to a similar transformation of the scene but in the opposite direction kanatani1987camera . For stationary cameras that freely rotate but stay in a fixed location, the camera transformation is only modeled by its rotation. In other words, the translation of the camera is zero. Therefore, every point in an image seen by a stationary camera is transformed to a point in another image taken after camera rotation. The mathematical relationship between and is thus represented by:

 wu′=KRTK−1u (7)

Where is the scale of the projection and represents the depth of the point. It should be noted that because the rotation matrix is orthonormal.

### 2.3 Focal Length in the v Direction

An image taken after a pan rotation of a camera provides a very straightforward formulation to estimate the focal length of the camera. In fact, it imposes two constraints on the camera rotations around the and axes. The resulting projection equation after substituting for is:

 wu′=KRTyK−1u (8)

has only one DoF which is the angle of rotation around the -axis. After expanding and simplifying Eq.8 and eliminating the scale of projection, the following direct projection equations are obtained.

 v′=r11(v−v0)+r31fvr13v−v0fv+r33+v0 (9)
 u′=u0−u0−ur13v−v0fv+r33 (10)

Where is an element of at row and column . After simplification of Eq.10:

 v−v0fv=u0−uu0−u′−r33r13 (11)

Note that after a pure pan rotation, the coordinates of the new image will not be affected by the transformation. (The reader is referred to junejo2012optimizing for a detailed explanation and analysis about this fact.) In other words, image pixels only move horizontally and so the rate of change in the direction before and after the pan rotation is close to one, viz:

 u0−uu0−u′≈1 (12)

Substituting Eq.12 into Eq.11 and then replacing the equation obtained for the term in the Eq.9, we have:

 v′≈r11(v−v0)+r31fvr131−r33r13+r33+v0 (13)

The above substitution changes the value of the denominator to 1 and hence simplifies the whole projection equation.

 v′−r11v≈r31fv+(1−r11)v0 (14)

Since Eq.14 is linear, one might think that it can be solved by constructing a linear system of equations using the matched points from two images taken after the pan rotations of the camera. Unfortunately, the equation is numerically unstable because the value of which causes ambiguity in calculating the shift in the principal point agapito2001self . In short, we cannot calculate the location of the principal point in the direction from a camera rotated purely around the axis. Knowing that the principal point is close to the center of the image , where and represent the image height and width respectively, we replace with in Eq.14. Thus, we can derive a suitable linear equation to estimate the focal length in the direction from an image taken after a pan rotation.

 fv≈v′−r11v−(1−r11)cvr31 (15)

Eq.15 needs only one point in the reference image that corresponds to in the transformed image. If there are more point correspondences, we can easily use the average of these points to obtain more robust results.

### 2.4 Focal Length in the u Direction

So far, we could estimate by the information provided from an image taken after a pan rotation. We repeat the same procedure to approximate . This time we need an image taken after a pure tilt rotation of the camera. Thus, the projection equation is characterized by replacing with and relating the coordinates of a point in the reference image and a point in the tilted image by:

 v′=v−v0r23u0−ufu+r33+v0 (16)
 u′=u0−r22(u0−u)+r32fur23u0−ufu+r33 (17)

Following the same reasoning as in Section 2.3, a closed-form solution to estimate the focal length of the camera in the direction is obtained by:

 fu≈r22u−u′+(1−r22)cur32 (18)

### 2.5 Principal Point

To estimate the location of the principal point we need to impose one constraint on the rotation matrix which can be achieved by preventing the camera from rotating around the -axis. In real applications the easiest way is to mount the camera on a tripod. In case of working in a robotic environment or with a PTZ camera, controlling roll rotation is straightforward since the camera has already been mounted or fixed. Therefore, we match an image taken after a pan and tilt rotation of the camera with the reference image and use the acquired point correspondences to estimate the location of the principal point.

Following the general projection equation in Eq.7, the direct equations for relating the location of a point in the reference image to its matched point in the transformed image are described by:

 v′=r11(v−v0)+r21(u0−u)fvfu+r31fvr13v−v0fv+r23u0−ufu+r33+v0 (19)
 u′=u0−r12(v−v0)fufv+r22(u0−u)+r32fur13v−v0fv+r23u0−ufu+r33 (20)

Where except for and , all other terms are known. After simplifying the equations and collecting the coefficients of various powers of and , we see that the equations are nonlinear due to the presence of two terms and in Eq.19 and Eq.20 respectively. Nevertheless, the equations can be solved using any nonlinear solver such as Levenberg-Marquardt. But, in order to let the nonlinear solver converge towards the true global minimum, we first need a reasonable initial guess. Here, we propose a method to linearize Eq.19 and Eq.20 to achieve a close estimation of the location of the principal point when the focal lengths and the camera rotations are known.

A feasible approach to linearize the projection equations is to decrease the contributions of the two above-mentioned nonlinear terms in the equations and then eliminate the terms from the equations. Decreasing the value of the nonlinear terms depends on two factors, namely and which are the elements of the rotation matrix and the value of the unknowns which are and . The former coefficients have already been reduced due to our initial assumption of rotating the camera by small angles. On the other hand, we will show that we can reduce and to smaller values by manipulating the scale of these points’ coordinates.

Estimating the principal point by a nonlinear algorithm is known to be arduous since it tends to fit to noise agapito2001self . Due to this sensitivity to noise, researchers have taken advantage of including some prior knowledge about the distribution of the principal point. It is reasonable to expect that the principal point is close to the center of the image. This prior knowledge is the basis of the Maximum a Posteriori Estimation employed in agapito2001self to alter the cost function of the minimization problem. In order to arrive at a linear system, we employ the same idea. Specifically, we assume that the principal point is only slightly shifted from the center of the image.

 v0=cv+δv u0=cu+δu (21)

Where and represent the amount of shift in the and directions, respectively and are the coordinates of the center of the image. Following this change, we replace each term with and every with in Eq.19 and Eq.20 where and . Therefore, after some simplifications, the general projection equations can be rewritten based on the new variable substitutions.

 Gδ2v+Hδvδu+(A+I−G^v′)δv+(B−H^v′)δu=I^v′−C (22) −Hδ2u−Gδvδu+(D−G^u′)δv+(E−I−Hu′)δu=I^u′−F (23)

Where,

 A=−r11,B=r21fvfu C=r11^v+r21^ufvfu+r31fv D=−r12fufv,E=r22 F=r12^vfufv+r22^u+r32fu G=−r13fv,H=r23fu I=r13^vfv+r23^ufu+r33

By including our prior knowledge about the image center into the equations, we significantly reduce the values of nonlinear terms and allow them to be ignored. Once the nonlinear terms are removed, a linear system of equations can be constructed using the detected point correspondences.

 ⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣A+I1−Gv′1B−Hv′1⋮⋮A+In−Gv′nB−Hv′nD−^u′1GE−I1−Hu′1⋮⋮D−^u′nGE−In−Hu′n⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦[δvδu]=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣^v′1I1−C1⋮^v′nIn−Cn^u′1I1−F1⋮^u′nIn−Fn⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦ (24)

Where represents using coordinates of the th point in the corresponding term and is the number of correspondences. As shown above, the system of equations is constructed in the form and can be easily solved using a least square method or any linear solver 111MATLAB can solve the equation system by the command .. However, since there are two unknowns, by detecting only one point correspondence we are able to solve the system for and . For better estimates, one can use more point correspondences. Figure 2: Focal length calculated in the v and u directions using Active Calibration Strategy B (AC) versus SAC for various angles of rotations. In SAC we only use one point correspondence.

## 3 Results and Analysis

In order to better understand the performance of the proposed simplified active calibration, we perform several experiments to clarify how the method works for various rotation angles. We show that rotating by small angle is crucial to obtain good results.

Based on our proposed method, focal length in the and directions can be estimated using Eq.15 and Eq.18, respectively. Only one point correspondence is required to calculate the focal length. Fig.2 shows the focal lengths estimated using various pan and tilt angles. It can be seen that when the pan and tilt angles are small, the estimated focal lengths are very close to the ground truth.

The magnitude of the pan and tilt angles affect estimating the principal point location as well. In fact, two successive rotations (pan and tilt) are required to calculate the center of projection. Thus, in another experiment, we rotate the camera by 900 combinations of pan and tilt rotations (from to ), and then use the projected points on the image plane to estimate the principal point location by calculating the proposed formulation (Eq.24). Note that we use either the estimated value of and that were calculated in the previous step or the actual focal length. The results are shown in Fig.3. We can see that for small rotations, the results obtained by our formulation are close to the real location which has been identified in the figure by a red plane. Even when the focal length is not accurate, a good estimate of the principal point can be found if the pan and tilt rotations are small. Figure 3: Coordinates of the principal points calculated after various pan/tilt rotations of random 3D points. Colors are distributed based on the L2 norm of the pan and tilt angles. a) Shows the values obtained for u0 when inaccurate focal lengths (fu=774.71 and fv=771.18) are used. MSE(u0)=1.49 pixels for all combinations of pan and tilt angles. b) Shows the values obtained for v0 when inaccurate focal lengths (fu=774.71 and fv=771.18) are used. MSE(v0)=2.30 pixels for all combinations of pan and tilt angles. c) Shows the values obtained for u0 when accurate focal lengths (Fu=Fv=772.55) are used. MSE(u0)=0.05 pixels for all combinations of pan and tilt angles. d) Shows the values obtained for v0 when accurate focal lengths (Fu=Fv=772.55) are used. MSE(v0)=0.04 pixels for all combinations of pan and tilt angles. The red plane specifies the ground truth.

In addition, Fig.3 shows that the error caused by the angle variation is more negligible than the error caused by inaccurate values of the focal length. Specifically, the Mean Square Error of () is 1.49 pixels when inaccurate focal lengths are used (Fig.3(a)). By contrast, when the actual focal length is used in Eq.24, is decreased to 0.05 pixels (Fig.3(c)). This reveals the significance of having accurate focal length in calculating the principal point location. A similar analysis is valid for the other axis () which is shown in Fig.3(b) and Fig.3(d).

Knowing how many point correspondences are required to calculate the principal point location (by Eq.24

) is crucial. Based on our experiments, with only four points that are uniformly distributed in the image, a good estimate of the principal point location can be obtained. Fig.

4(a) illustrates principal point locations (on the image plane) obtained by solving Eq.24 with inaccurate focal lengths and only four point correspondences of the teapot point cloud. Fig.4(b) shows the estimated principal points on the image obtained by solving Eq.24 with inaccurate focal lengths and 500 point correspondences of random 3D points. Both experiments are carried out on 900 combinations of pan and tilt angles which range from to . Pan and tilt angles are included in Fig.4 by calculating the norm of the angles () and assigning meaningful colors to them that range from red (closer to zero) to blue (bigger angles). As can be seen in Fig.4(a), when the pan and tilt rotations are small, even with four point correspondences a principal point that is very close to the actual principal point (specified by a red cross) can be calculated. Using more point correspondences we obtain almost similar error distribution, which is shown in Fig.4(b). Figure 4: The estimated locations of the principal point on the image plane for combinations of various rotation angles (from −7.5∘ to 7.5∘) using Eq.24. Colors are distributed based on the L2 norm of the pan and tilt angles. a) Results for solving with only four point correspondences of the teapot point cloud. b) Results for solving with 500 point correspondences of the random 3D points. The actual principal point location is (314,244).

In another experiment, we calculate the proposed simplified active calibration formulation on 1000 different runs of 500 randomly generated 3D points for small pan () and tilt (

) angles. The order of calculating the intrinsics was specified earlier. The mean and standard deviation of the results obtained are shown in Table

1. As we can see, our proposed active calibration formulation attains results very close to the ground truth. Specifically, the error in the principal point location is less than one pixel and the error in focal length estimates is less than 2 pixels.

All things considered, we assessed the proposed Simplified Active Calibration formulation on simulated scenes in ideal situations, i.e., when the 3D rays are not altered due to camera lens distortions and when there is no noise in the scene. We showed that the proposed formulation can estimate the camera intrinsics when the camera rotation is small and pure. In fact, for calculating the focal length we used the so-called “degenerate camera configuration.” Moreover, we demonstrated that using small rotations one can compensate for the error caused by inaccuracy in the estimated focal length to find the principal point. In other words, rotating the camera by small angles lessens the influence of inaccurate focal length in calculating the principal point location. Figure 5: The error caused by uncertainty in determining the angle of the camera. a) The effects of the uncertainty of the camera pan rotation on calculating the focal length in the v direction by SAC. b) The effects of the uncertainty of the camera tilt rotation on calculating the focal length in the u direction by SAC. c) The effects of the uncertainty of the camera pan and tilt rotation on calculating the v coordinate of the principal points by SAC. d) The effects of the uncertainty of the camera pan and tilt rotation on calculating the u coordinate of the principal points by SAC.

### 3.1 Noise Analysis

All of the above-mentioned experiments were done in ideal situations where the angles acquired from the camera and the location of matched points were assumed to be exact. In real-world conditions, however, angles and point correspondences are noisy. In the following sections we try to understand how the proposed method works in real-world conditions where parameters are contaminated by various types of noise.

### 3.2 Angular Uncertainty

Acquiring the rotation angles requires either specific devices such as gyroscopes or a specially designed camera called a PTZ camera. Even using these devices does not guarantee that the extracted rotation angles are noise-free. To simulate the noisy conditions of a real-world application, we contaminated the angles of the above-mentioned teapot sequences with increasing angular errors.

While the point correspondences are kept fixed for all of the pan and tilt rotations, we calculate the focal length (Eq.15 and Eq.18) and principal point coordinates (Eq.24) using contaminated pan and tilt angles. The results are shown in Fig.5. Specifically, Fig.5(a) and Fig.5(b) show the error of our proposed formula for estimating the focal length when the pan and tilt angles are not accurate. Every sequence has been coloured based on its rotation angle, ranging from blue indicating smaller angles to red for larger angles. For focal length estimation, Fig.5(a) and Fig.5(b) illustrate that the sequences taken with smaller angles have steeper slope than the sequences acquired with larger rotation angles. This shows that focal lengths are more sensitive to angular noise when the camera is rotated by smaller angles rather than larger angles. Figure 6: The error caused by uncertainty in location of points. a) Error of the estimated focal length in the v direction using SAC when the location of the teapot points are disturbed by different values of σpixel. b) Error of the estimated focal length in the u direction using SAC under the same conditions as in (a). c) Error of the estimated v0 using SAC under the same conditions as in (a). d) Error of the estimated u0 using SAC under the same conditions as in (a).

Fig.5(c) and Fig.5(d) demonstrate how uncertainty in angular values affects estimating the principal point coordinates. Similar to the effect of noise on focal lengths, the distribution of the red colours (greater angles) around the zero line in Fig.5(c) and Fig.5(d) indicates that the estimates of the principal point coordinates are less affected by the angular noise when the angle of rotation is not very small.

Overall, when the camera is rotated by small angles, the influence of the angular noise on SAC equations is significant. On the other hand, SAC tends to use the benefit of rotating the camera by small angles. Therefore, to avoid magnifying the effect of noise it is important not to rotate the camera by very small angles. If the pan and tilt angle of the camera is not very small (usually ), the difference in estimated focal lengths will be less than pixels which are still considered as close initial guesses for further non-linear refinements. Nonetheless, our experiments with real images in Section 3.4 reveal that SAC can be used in real situations and the angular noise makes the estimation slightly inaccurate. Figure 7: A screenshot of the designed Automatic Camera Control application that is able to rotate the camera by specific angles about Y-axis (pan) and X-axis (tilt). The camera is Canon vc-c50i. Figure 8: Two sequences of real images used for SAC. Every row represents one sequence. a) A reference image. b) Image taken after panning the camera by 0.5625∘. c) Image taken after tilting the camera by −0.675∘. d) Image taken after first panning the camera by 0.7875∘ and then tilting the camera by −0.675∘. e) A reference image. f) Image taken after panning the camera by −4.6125∘. g) Image taken after tilting the camera by −4.1625∘. h) Image taken after first panning the camera by 4.3875∘ and then tilting the camera by −4.6125∘.

### 3.3 Point Correspondence Noise

Another type of noise that affects the SAC equations is the noise in the location of features used for matching. To simulate such conditions, we assume that the location of every teapot point is disturbed by a Gaussian noise with zero mean and variance

. Then, we calibrate the camera using SAC for all in the range of to . The intrinsic parameters obtained are illustrated in Fig.6.

Fig.6(a) and Fig.6(b) illustrate the influence of pixel noise on the estimation of focal length (Eq.15 and Eq.18). Also, Fig.6(c) and Fig.6(d) show how SAC estimates the coordinate of the principal point (Eq.24) in noisy conditions. Colours are distributed based on the rotation angle of the camera and, hence, the distribution of the colours reveals how noise affects the SAC equations. In fact, the high concentration of red, yellow, and orange points around the zero error line in Fig.6(a) to (d) reveals that when the angle of the camera rotation is not very small, SAC achieves low-error estimates for focal lengths. This corroborates the claim that very small camera rotations can cause results from the SAC formulations to have high error.

### 3.4 Real Images

We studied the proposed SAC formulations on real images as well. We used a Canon VC-C50i PTZ camera that is able to freely rotate around the -axis (pan) and the -axis (tilt). The camera can be controlled by an application called Automatic Camera Control (created by the first author) that uses a standard RS-232 serial communication to control the camera. Therefore, the required pan and tilt rotation angles can be set in a specific packet and then be written into the camera serial buffer to cause the camera to rotate based on the assigned rotation angles. A screenshot of the ACC application can be seen in Fig.7.

Using the above-mentioned procedure, we took 8 sequences of images for evaluating the proposed SAC formulations. Fig.8 shows two sequences of our bookshelf scene. All sequences were taken using a fixed zoom. While keeping the zoom of the camera unchanged, another 30 images were acquired from various viewpoints of a checkerboard pattern. The ground truth for the intrinsic parameters were calculated by applying the method of Zhang zhang1999flexible on the checkerboard images.

The performance of SAC formulations on the 8 sequences of real images is reported in Table 2. For every sequence, we only used the images in the sequence. For example, to calculate the focal length in the direction of Sequence 1, we found the point correspondence using faraji2015erel ; faraji2015extremal between the reference image (Fig.8(a)) and the image taken after the pan rotation of the camera (Fig.8(b)). Then, we used only one of the matched points that is closer to the center of the image. Although, we did not include the lens distortion parameter into the SAC formulation (because it creates non-linear equations), we decrease the inaccuracy of the focal length estimates by using a matched point that is closer to the center of the image. Thus, the results are less affected by the lens distortion. A similar procedure was adopted with the image taken after a tilt rotation of the camera (Fig.8(c)) for calculating the focal length in the direction of Sequence 1. The location of the principal point was estimated by corresponding the reference image (Fig.8(a)) to the image taken after panning and tilting the camera (Fig.8(d)). To solve the linear system of equations of SAC (Eq.24), we used all of the matched points.

The errors reported by applying SAC on 8 different sequences of real images in Table 2 show that inspite of the presence of various types of noise, such as angular uncertainties, point correspondence noise and lens distortion, focal lengths estimated by SAC are close to the results of the method of Zhang zhang1999flexible , except when the angles of rotations are very small (). As we discussed earlier, the SAC formulation for estimating the coordinates of the principal point is sensitive to angular noise. Therefore, we can see that sometimes the error in the principal point estimate is increased; for example, in Sequences 3 and 4 of Table 2. One can decrease this error by including more matched points (in Eq.24) taken after panning and tilting the camera by various angles.

## 4 Conclusion

In this paper we presented a new Simplified Active Calibration formulation. Our derivations provided closed-form and linear equations to estimate the parameters of the camera using three image pairs taken before and after panning, tilting, and panning-tilting the camera.

A basic assumption about the rotation of a fixed camera was made; i.e., to solve the proposed equations, knowing the rotation angles of the camera is necessary. The proposed formulation can be used in practical applications such as surveillance, because in PTZ and mobile phone cameras accessing the camera motion information is straightforward.

The proposed closed-form formulations for estimating the focal lengths can be solved with only one point correspondence. Finding the correspondence point is straightforward. Following recent developments in feature extractors, one can extract repeatable regions from a pair of images. This is especially useful for applications that favour no point correspondences; where instead of the reference and transferred points in Eq.18 and Eq.15, the average of the edge points or the centroid of the region can be used.

The results of solving our proposed formulations on randomly simulated 3D scenes indicated a very low error rate in estimating the focal lengths and the principal point location. We evaluated our proposed SAC formulation for two different noise conditions, namely angular and pixel noise. The simulated results showed that if the angle of rotation is not very small, the error caused by using SAC formulation is low and can be alleviated by a further non-linear refinement. This conclusion was later verified in our experiment with real images. Our future work will focus on including non-linear parameters into the Simplified Active Calibration equations and use the result of the current study as a close initial guess for an optimization procedure.

## 5 Acknowledgements

The authors acknowledge the financial support of NSERC, Canada, in making this work possible.

## References

• (1) F. Zhao, T. Tamaki, T. Kurita, B. Raytchev, K. Kaneda, Marker-based non-overlapping camera calibration methods with additional support camera views, Image and Vision Computing 70 (2018) 46–54.
• (2) S. J. Maybank, O. D. Faugeras, A theory of self-calibration of a moving camera, International Journal of Computer Vision 8 (2) (1992) 123–151.
• (3) O. D. Faugeras, Q.-T. Luong, S. J. Maybank, Camera self-calibration: Theory and experiments, in: European conference on computer vision, Springer, 1992, pp. 321–334.
• (4) A. Basu, Active calibration, in: Robotics and Automation, 1993. Proceedings., 1993 IEEE International Conference on, IEEE, 1993, pp. 764–769.
• (5)

A. Basu, Active calibration: Alternative strategy and analysis, in: Computer Vision and Pattern Recognition, 1993. Proceedings CVPR’93., 1993 IEEE Computer Society Conference on, IEEE, 1993, pp. 495–500.

• (6) J. Aloimonos, I. Weiss, A. Bandyopadhyay, Active vision, International journal of computer vision 1 (4) (1988) 333–356.
• (7) R. I. Hartley, Self-calibration of stationary cameras, International Journal of Computer Vision 22 (1) (1997) 5–23.
• (8) F. Du, M. Brady, Self-calibration of the intrinsic parameters of cameras for active vision systems, in: Computer Vision and Pattern Recognition, 1993. Proceedings CVPR’93., 1993 IEEE Computer Society Conference on, IEEE, 1993, pp. 477–482.
• (9) L. Dron, Dynamic camera self-calibration from controlled motion sequences, in: Computer Vision and Pattern Recognition, 1993. Proceedings CVPR’93., 1993 IEEE Computer Society Conference on, IEEE, 1993, pp. 501–506.
• (10) G. P. Stein, Accurate internal camera calibration using rotation, with analysis of sources of error, in: Computer Vision, 1995. Proceedings., Fifth International Conference on, IEEE, 1995, pp. 230–236.
• (11) R. I. Hartley, Self-calibration from multiple views with a rotating camera, in: European Conference on Computer Vision, Springer, 1994, pp. 471–478.
• (12) L. Agapito, E. Hayman, I. Reid, Self-calibration of rotating and zooming cameras, International Journal of Computer Vision 45 (2) (2001) 107–127.
• (13) M. Pollefeys, R. Koch, L. Van Gool, Self-calibration and metric reconstruction inspite of varying and unknown intrinsic camera parameters, International Journal of Computer Vision 32 (1) (1999) 7–25.
• (14) T. Elamsy, A. Habed, B. Boufama, Self-calibration of stationary non-rotating zooming cameras, Image and Vision Computing 32 (3) (2014) 212–226.
• (15) J.-M. Frahm, R. Koch, Camera calibration with known rotation., in: ICCV, 2003, pp. 1418–1425.
• (16) J.-M. Frahm, R. Koch, Camera calibration and 3d scene reconstruction from image sequence and rotation sensor data., in: VMV, 2003, pp. 79–86.
• (17) J.-M. Frahm, R. Koch, Robust camera calibration from images and rotation data, in: Joint Pattern Recognition Symposium, Springer Berlin Heidelberg, 2003, pp. 249–256.
• (18) J. Knight, A. Zisserman, I. Reid, Linear auto-calibration for ground plane motion, in: Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on, Vol. 1, IEEE, 2003, pp. I–I.
• (19) L. Hua, W. Fu-Chao, H. Zhan-Yi, A new linear camera self-calibration technique, Chinese J. Computers 23 (11) (2000) 1121–1129.
• (20) I. N. Junejo, H. Foroosh, Practical ptz camera calibration using givens rotations, in: Image Processing, 2008. ICIP 2008. 15th IEEE International Conference on, IEEE, 2008, pp. 1936–1939.
• (21) D. Wan, J. Zhou, Self-calibration of spherical rectification for a ptz-stereo system, Image and Vision Computing 28 (3) (2010) 367–375.
• (22) Z. Wu, R. J. Radke, Keeping a pan-tilt-zoom camera calibrated, IEEE transactions on pattern analysis and machine intelligence 35 (8) (2013) 1994–2007.
• (23) R. Galego, A. Bernardino, J. Gaspar, Auto-calibration of pan-tilt cameras including radial distortion and zoom, Advances in Visual Computing (2012) 169–178.
• (24) Q. Sun, X. Wang, J. Xu, L. Wang, H. Zhang, J. Yu, T. Su, X. Zhang, Camera self-calibration with lens distortion, Optik-International Journal for Light and Electron Optics 127 (10) (2016) 4506–4513.
• (25) L. Heng, G. H. Lee, M. Pollefeys, Self-calibration and visual slam with a multi-camera system on a micro aerial vehicle, Autonomous Robots 39 (3) (2015) 259–277.
• (26) M. Brückner, F. Bajramovic, J. Denzler, Intrinsic and extrinsic active self-calibration of multi-camera systems, Machine vision and applications 25 (2) (2014) 389–403.
• (27) P. A. Tresadern, I. D. Reid, Camera calibration from human motion, Image and Vision Computing 26 (6) (2008) 851–862.
• (28) A. Basu, Active calibration of cameras: theory and implementation, IEEE Transactions on Systems, man, and cybernetics 25 (2) (1995) 256–265.
• (29) A. Basu, K. Ravi, Active camera calibration using pan, tilt and roll, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 27 (3) (1997) 559–566.
• (30) M. Faraji, A. Basu, A Simplified Active Calibration algorithm for Focal Length Estimation, ArXiv e-printsarXiv:1806.03584.
• (31) Q. Ji, S. Dai, Self-calibration of a rotating camera with a translational offset, IEEE Transactions on Robotics and Automation 20 (1) (2004) 1–14.
• (32) K.-I. Kanatani, Camera rotation invariance of image characteristics, Computer vision, graphics, and image processing 39 (3) (1987) 328–354.
• (33) I. N. Junejo, H. Foroosh, Optimizing ptz camera calibration from two images, Machine Vision and Applications 23 (2) (2012) 375–389.
• (34) Z. Zhang, Flexible camera calibration by viewing a plane from unknown orientations, in: Computer Vision, 1999. The Proceedings of the Seventh IEEE International Conference on, Vol. 1, Ieee, 1999, pp. 666–673.
• (35) M. Faraji, J. Shanbehzadeh, K. Nasrollahi, T. B. Moeslund, Erel: Extremal regions of extremum levels, in: Image Processing (ICIP), 2015 IEEE International Conference on, IEEE, 2015, pp. 681–685.
• (36) M. Faraji, J. Shanbehzadeh, K. Nasrollahi, T. Moeslund, Extremal regions detection guided by maxima of gradient magnitude, Image Processing, IEEE Transactions on.