 # Solving the Robot-World Hand-Eye(s) Calibration Problem with Iterative Methods

Robot-world, hand-eye calibration is the problem of determining the transformation between the robot end-effector and a camera, as well as the transformation between the robot base and the world coordinate system. This relationship has been modeled as AX=ZB, where X and Z are unknown homogeneous transformation matrices. The successful execution of many robot manipulation tasks depends on determining these matrices accurately, and we are particularly interested in the use of calibration for use in vision tasks. In this work, we describe a collection of methods consisting of two cost function classes, three different parameterizations of rotation components, and separable versus simultaneous formulations. We explore the behavior of this collection of methods on real datasets and simulated datasets, and compare to seven other state-of-the-art methods. Our collection of methods return greater accuracy on many metrics as compared to the state-of-the-art. The collection of methods is extended to the problem of robot-world hand-multiple-eye calibration, and results are shown with two and three cameras mounted on the same robot.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction Figure 1: Hand-Eye Robot-World Calibration: A camera (eye) mounted at the end-effector (hand) of a robot.

The robot-world hand-eye calibration problem consists of determining the homogeneous transformation matrices (HTMs) between the robot hand, or end-effector, to the camera, as well as the transformation of the robot base to the world coordinate system.

The preliminaries of the robot-world hand-eye calibration problem are as follows. Let the transformation from the hand coordinate frame to the camera coordinate frame be or simply , and the transformation from the robot-base coordinate frame to the world coordinate frame be or simply . The transformation of the robot base frame to the hand coordinate frame is or simply , and is assumed to be known from the robot controller. Finally, the transformation from the world coordinate frame to the camera coordinate frame is represented by HTM or . is calculated using a camera calibration procedure such as Zhang zhang2000flexible , where the world coordinate frame is defined by a calibration object in the workspace. The transformations are illustrated by Figure 1. We note here that the labeling of the transformations in our version of the problem is different than that used in the traditional robot-world hand-eye calibration problem Zhuang1994simultaneous in that some matrices are inverted (, ) and the rest of matrices are exchanged (, ). Despite this difference, the linear relationship is still the same in our interpretation of the problem versus others. We interpreted the problem differently because by doing so we are able to simplify the derivation of some of the cost functions that we use in Section 2.2.

Given these preliminaries, Zhuang et al. Zhuang1994simultaneous formalized the robot-world hand-eye calibration explicitly as the homogeneous matrix equation , where all of the matrices are matrices, and were the first to provide a method to find solutions for and . Many different positions of the robot and camera are used to generate multiple relationships , , where is the number of robot poses used for the calibration.

The robot-world hand-eye calibration problem is different, though related, to the hand-eye calibration problem, which was formulated as by Shiu and Ahmad shiu1989calibration . In the hand-eye calibration problem, the matrices and are now considered as the relative transformations from one pose to another. Use of relative transformations is problematic as decisions must be made as to how to convert absolute transformations into relative ones. This work considers instead and as poses with respect to the world coordinate system and robot base coordinate system, respectively, and as a result robot-world hand-eye calibration is needed.

We structure the paper as follows. In the rest of Section 1 and specifically in Subsection 1.1, we provide an overview of existing approaches and recent work. In Subsection 1.2, we discuss our approach and contributions to the robot-world hand-eye calibration problem. Section 2 introduces two classes of proposed methods and extends those classes to the multiple camera case. Section 3 describes the experiments, metrics, and implementation details. In Section 5, the results on real and simulated datasets are shown and discussed. Finally, in Section 6, we present our conclusions.

### 1.1 Recent work

Most recent work on the robot-world hand-eye calibration problem makes use of the decomposition of into a purely rotational part and a translational part, where represents a rotation matrix, and a

translation vector as shown in Equation

1.

 [[l]RAtA0T1][[l]RXtX0T1] =[[l]RZtZ0T1][[l]RBtB0T1] (1)

Methods utilizing the decomposition of the problem that is shown in Equation 1 into two parts are called separable methods. Separable methods cause Equation 1 to produce two other equations: the rotation part as represented by Equation 2 and the translation part as represented by Equation 3.

 RARX =RZRB (2) RAtX+tA =RZtB+tZ (3)

Since Equation 3 is linear in and if

is known, the most frequent approach to estimate

and is to first find and using Equation 2, and then use that solution and Equation 3 to find and .

Of those methods that separate the estimation of the rotation and translation parts using Equation 1, there are several different approaches. In Zhuang et al. Zhuang1994simultaneous , a linear solution was proposed for finding the unknowns and by representing the rotation matrices as quaternions, and the translation components were found using linear least squares. Dornaika and Horaud in dornaika1998simultaneous gave a closed-form method for estimating the rotation components using quaternions, which did not require normalization like in Zhuang1994simultaneous . Translation components were estimated with linear least squares as in Zhuang1994simultaneous . Hirsh et al. in hirsh2001iterative proposed a separable, iterative approach in which the estimation of and is alternated. The method consists of assuming that is known and estimating by averaging for all to generate an estimate for ; this process is repeated in a similar way for using the estimate found for , and the estimation of and is continued until termination conditions are met. Like the other methods in this group, estimation of rotation is separated from translation, and rotation is represented by quaternions. In the method of Shah Shah2013Solving

, the Kronecker product and singular value decomposition were used to create a closed-form solution.

The second class of methods are typically called simultaneous solutions; they do not decouple rotation and translation error and thus have the advantage that error from the rotation estimation is not propagated to the translation estimation. In dornaika1998simultaneous , in addition to a closed-form separable solution, Dornaika and Horaud present a formulation of the problem as a non-linear least squares problem. The rotation components are represented by matrices, so there are 18 parameters for rotation and 6 for translation. The cost function contains penalty terms that enforce orthonormality of the rotation matrices. Following Dornaika and Horaud, Stobl and Hirzinger in Strobl2006Optimal estimate the translation and rotation components through non-linear estimation techniques, though in their approach weights for the rotational and translational parts are chosen automatically and a position/orientation precision ratio parameter is required. The work of Li et al. li2010simultaneous proposed a simultaneous solution that is found using dual quaternions and the Kronecker product.

Besides those approaches that deal with the equality in separable and simultaneous versions, there have been some approaches within the context of the hand-eye calibration problem to refine the estimation of the camera calibration parameters. For example, in Horaud1995Hand , Horaud and Dornaika incorporated camera calibration parameters into a cost function for the hand-eye calibration problem and solved using the Levenberg-Marquardt method when the rotations are represented by quaternions. In this method, the requirement that the quaternion must have unit norm was enforced using penalty terms. Recently, Malti malti2013hand in a work about hand-eye calibration, incorporated a variation of the robot-world hand-eye calibration formulation and used reprojection error together with epipolar constraints to simultaneously refine the camera intrinsic and distortion parameters in addition to the and HTMs. Unlike Horaud and Dornaika Horaud1995Hand , Malti in malti2013hand uses Euler angles to represent rotation and as a result his method does not need penalty terms.

### 1.2 Our approach and contributions

Our work on the robot-world hand-eye calibration problem is motivated by three particular situations. The first is that we use a robot-mounted camera for multi-view high-quality reconstruction of complicated objects such as leafless trees as in Tabb Tabb2013 ; while some extrinsic camera calibration errors can be tolerated, the reconstruction outcomes improve when the camera calibration error is low. The second situation is where we use the same robot-mounted camera in laboratory or field conditions, where it can be the case that non-experts will be involved with performing calibration and acquiring data from a remote location. For this particular situation, our goal was to devise methods for robot-world hand-eye calibration that are not sensitive to the particular sequence of motions used during the calibration and is robust to non-ideal calibration situations. Finally third, we have some situations in which multiple cameras are mounted on the same robot end-effector. While it is possible to only calibrate one camera using robot-world hand-eye calibration and then perform stereo camera calibration, we instead desired a way to calibrate all components at the same time so that error from one step is not propagated to the final calibration. This approach also allows cameras to have vastly different fields of view and does not require overlapping fields of view for any two cameras, which is more flexible than stereo calibration.

We note here that the robot-world hand-eye calibration problem has been discussed in the literature for more than two decades. Early contributions focused on linear and/or closed-form solutions because of computational efficiency. However, with the advent of open source nonlinear least square solvers, such as levmar lourakis04LM and Ceres ceressolver , in addition to the general interest in bundle adjustment Triggs1999Bundle , iterative methods with previously unconsidered parameterizations of the rotational components offer advantages over the linear and closed-form approaches.

In this paper, we propose a collection of iterative methods for the robot-world hand-eye calibration problem. There are two classes of cost functions in this collection. The first class of cost functions minimizes the sum of squared difference between and over positions of the robot as given in Equation 4 below, or in a closely-related version of it, where denotes the Frobenius norm. This class is discussed in Subsection 2.1.

 ∑i∈[0,n−1]||AiX−ZBi||2F (4)

However, in the first class of cost functions as given in Equation 4, we observed that artifacts and errors related to the calibration object and camera calibration method are sometimes propagated to the estimation of and . The second class of cost functions aims to reduce the influence of these camera calibration artifacts by finding and based on camera reprojection error, without explicitly using and this is discussed in Subsection 2.2.

Each of the aforementioned two classes of cost functions contains two or more sub-classes or methods, first, to explore different choices of cost functions; separable versus simultaneous solutions, and second, to explore different parameterizations of the rotation components. For the latter, we use three different possible parameterizations namely: Euler angles, axis-angle, and quaternions.

Since the two classes of cost functions are nonlinear, we use nonlinear least-square solvers based on the Levenberg-Marquardt method to find the approximate solutions of and . It is important to mention here that the above collection of classes and methods easily extend to the problem of calibrating multiple cameras mounted on one robot as shall be discussed in Subsection 2.3.

Similarities between the state-of-the-art and our proposed collection of methods are as follows. Our methods are iterative, like the method of Hirsh et al. hirsh2001iterative and the iterative method of Dornaika and Horaud dornaika1998simultaneous . The work of Malti malti2013hand concerning using camera reprojection error to estimate HTMs and is similar to our second class of methods, when using the Euler angle representation for rotations (Subsection 2.2). One difference is that Malti’s work assumes that the motions are relative and as a result uses different definitions of the matrices than we do.

Our contributions to the state-of-the-art in robot-world hand-eye calibration are summarized as follows:

• We provide a comprehensive comparison that explores different choices of cost function, parameter choices, and separable versus simultaneous solutions within our collection of methods, and contrast those methods with the state-of-the-art methods, on real datasets as well as on simulated datasets.

• Our collection of methods is provided to the community as open source code tabb2017solving_dataset .

A portion of the work presented in this paper was previously published as Tabb2015Parameterizations . However, the present paper expands the work of Tabb2015Parameterizations in the following ways:

• This paper evaluates three different parameterizations of the rotation matrices in the experimental results: Euler angles, the axis-angle representation, and quaternions. Also, it discusses the effect of the parameter choice on results. In contrast, the prior work only used Euler angle parameterizations.

• While in Tabb2015Parameterizations only simultaneous methods were proposed, we also created separable formulations of some of the cost functions and as a result were able to demonstrate the differences on the results produced by separability.

• We incorporated a comparison with the work of Shah Shah2013Solving , which was not presented in Tabb2015Parameterizations .

• Prior work only considered one camera. In contrast, the methods presented in this paper are generalized to the multiple-eye, one robot problem, with no limit to the number of cameras used.

• Four additional real datasets were acquired.

• Simulated datasets were added to the experiments, which allow comparisons between the estimated and true solutions and additional analysis of the methods.

## 2 Collection of methods description

Many of the differences among the previous methods concern how orthonormality of the rotation matrices is maintained. In this section, we first describe the parameterizations of the rotational components of and such that orthonormality is preserved, and then discuss in details the two proposed classes of cost functions that we briefly introduced in the previous section followed by the extension to multiple cameras.

We explore three different parameterizations: Euler angle, axis-angle, and quaternions. Since these rotations are commonplace in the robotics literature and practice, in the next paragraph, we will only give a brief sketch of some details relevant to such parameterizations.

The Euler angle rotation representation is composed of three variables, which represent the rotation angles or direction cosines along the directions. In this work, we selected the directional order of the rotations in the following order: , , . Consequently, a rotation matrix can be represented as in the following manner . The axis-angle representation is another three-variable representation, where a three-element vector represents the axis of rotation, and is the angle of rotation about the axis such that a corresponding rotation matrix is generated with Rodrigues’ rotation formula. Finally, quaternions are a 4-element representation, where a unit vector can be converted to an orthonormal matrix.

We now define some notation to help us describe our two classes of cost functions and their extension to multiple cameras. Let the selected rotation representation be represented by a vector , with the size of equal to for the Euler angles and the axis-angle representations, and equal to for the quaternion representation. Then the directed cosine matrix representation of is .

With these preliminaries, we will now present our first class of cost functions.

### 2.1 First class: AX=ZB

The first class of cost functions uses the world to camera transformations and base to end-effector transformations to estimate the HTMs and . This is achieved by seeking the minimum of as given below:

 c1=n−1∑i=0||AiX−ZBi||2F (5)

In the cost function in Equation 5, if we substitute-in the rotation representation and the translation vectors for the unknowns and , the minimization problem becomes:

 (6)

A solution to the problem in Equation 6 belongs to the simultaneous category of methods for robot-world hand-eye calibration, because the rotation and translation are solved for at the same time.

We can also break the minimization problem in Equation 6 into two parts via a separable formulation: one part for rotation and the other part to estimate the translation components as captured by Equation 7 and Equation 8, respectively.

 argminpX,pZn−1∑i=0∥∥RA,iR(pX)−R(pZ)RB,i∥∥2F (7)
 argmintX,tZn−1∑i=0∥∥RA,itX+tA−R(pZ)tB,i−tZ∥∥2F (8)

An approximate solution to Equation 7 is first found and then the translation components are found from Equation 8.

The left hand and right hand sides of the cost function represent the transformation from robot base to camera via the world coordinate system (left hand side) and from robot base to camera via the end-effector coordinate system (right hand side) as graphically captured in Figure 1. We also explore the use of a slightly different cost function as given in Equation 9:

 c2=n−1∑i=0||Ai−ZBiX−1||2F (9)

We can find an approximate solution to simultaneously, similar to what we did with the function , by estimating , , , and . In order to simplify the notation, we let . Then and are the rotation parameters and translation vectors of , respectively. With these substitutions, we have the following minimization problem:

 argmin~pX,~tX,pZ,tZn−1∑i=0∥∥∥Ai−[R(pZ)tZ0T1]Bi[R(~pX)~tX0T1]∥∥∥2F (10)

As with , it is possible to create a separable formulation by first estimating the rotation components in Equation 11 and then the translation components in Equation 12.

 argmin~pX,pZ n−1∑i=0∥∥RA,i−R(pZ)RB,iR(~pX)∥∥2F (11) argmin~tX,tZ n−1∑i=0∥∥tA,i−R(pZ)RB,i~tX−R(pZ)tB−tZ∥∥2F (12)

In both of the minimization formulations (simultaneous and separable) for and , the number of parameters to be estimated for and are: six translation components and six or eight rotation parameters, depending on the choice of the rotation representation. Approximate solutions to all of the problems proposed in this paper are found with the Levenberg-Marquardt method for non-linear least squares marquardt1963algorithm and specifically its implementation in Ceres ceressolver . For an initial solution, values are chosen such that and are identity matrices and and have all elements zero.

### 2.2 Second class: camera reprojection error

In this subsection, we present our second class of cost functions and methods, which are based on the idea of minimizing camera reprojection error in order to find and . The approach presented in this subsection shares many similarities with the camera calibration approach of Zhang in zhang2000flexible , where the extrinsic camera calibration parameters are estimated by minimizing the camera reprojection error. Unlike the first class of methods, this approach is sensitive to the choice of initial solution; briefly, we use the result from the first class of methods as an initial solution and a more in-depth discussion of initial solution choices can be found in Subsection 3.1.

Before getting into the details of this class of methods, we first mention some preliminaries and notation. Let a three-dimensional point on the calibration object be , and because there are multiple such points we give a subscript , . It is assumed that all of the points are detected in the images used for calibration, so in the case of the chessboard pattern, these points are the corners of the chessboard. When is projected to the image from robot position using the intrinsic camera calibration parameters, we have image point and a corresponding original image point . We can represent in the context of the robot-world hand-eye calibration problem as follows:

 →~xij=f(k,[ZBiX−1]3×4→Xj) (13)

where is a vector containing the intrinsic camera calibration parameters. The bracket notation is used to denote the upper sub-matrix of what is inside the bracket. is the function that transforms into image points using .

Concerning , we use parameters from the intrinsic camera calibration matrix and radial and tangential distortion parameters, so . However, other radial and tangential distortion models may be chosen with no change to the method. As with in Equation 10, to simplify the representation of Equation 13, we substitute for another matrix , which is composed of an orthonormal rotation matrix and translation vector.

Given the preliminaries discussed above, the reprojection sum of squares error (rsse) is:

and by substituting we have:

We note that in Equations 14 and 15 and for the remainder of this paper, we use the L2 norm for vectors.

Finally, if we substitute in the rotation representation and the translation vectors for the unknown and matrices, the camera reprojection sum of squares error () minimization problem becomes as given in Equation 16, which we refer to as method:

 argmin~pX,~tX,pZ,tZn−1∑i=0m−1∑j=0||→xij−f(k,[R(pZ)tZ0T1]Bi[R(~pX)~tX0T1]3×4→Xj||2 (16)

where and are again the rotation parameter vector and translation vectors of or , respectively.

Within the simultaneous formulation of the robot-world hand-eye calibration problem as given in Equation 16, it is also possible to refine the camera intrinsic parameter vector by letting be parameters to be estimated in order to produce a local minimum in Equation 16 instead of constants as given in Equation 17, which we refer to as the method.

 argmin~pX,~tX,pZ,tZ,ki=0n−1∑m−1∑j=0||→xij−f(k,[R(pZ)tZ0T1]Bi[R(~pX)~tX0T1]3×4→Xj||2 (17)

In contrast with the minimized reprojection sum of squared error in both and methods, it is important to mention that in the camera calibration literature, the reprojection root mean squared error () is more common, as shown in Equation 18. The represents the average Euclidean distance between detected and reprojected calibration pattern points in the image plane, and its units are pixels. However, since the parameters that result in a minimum of also result in a minimum of , is used as a cost function in this work.

 rrmse= ⎷1mnn−1∑i=0m−1∑j=0||→xij−→~xij||2 (18)

### 2.3 Extending the two proposed classes of methods to multiple cameras

The multiple camera case requires estimating one HTM and multiple HTMs . As such, if we let the number of cameras be , and be the transformation from the world coordinate frame to the camera at the robot position. Then the relationship between the matrices in the context of robot-world hand-multiple-eyes calibration problem would be formulated as:

 Ai,0X= Z0B (19) Ai,1X= Z1B (20) ⋮ Ai,q−1X= Zq−1B (21)

for all . Therefore, we should now be able to represent all of the cost functions presented thus far in Subsections 2.1 and 2.2 within this multiple camera context. For instance, Equation 6 in the multiple camera context becomes:

 (22)

where and are the rotation parameterization and translation vectors for the HTM, . Separable versions follow using the same pattern, as it is done in the cost function and the formulations of the two classes of methods found in the previous subsections.

When using multiple cameras, it is common that in order to gain a wide variety of views of the calibration object, not all images for each camera will view the calibration object, particularly if the cameras are far apart or have different lenses and imaging sensor sizes. We now describe a generic method that we used in our implementation, to weight the influences from each camera in the cost function. We desired that each camera have an equal influence, but this weighting can be adjusted according to other needs.

We let the set of robot positions where a camera can view the calibration object be , and denote as the minimum size of determined for all cameras. Then, an individual weight for camera can be set as . Furthermore, if we let the cost function that is to be minimized be , the formulation of the robot-world hand-multiple-eyes calibration problem such that missing observations are handled is given by Equation 23.

 argminpX,tX,pZ,0,tZ,0,pZ,1,tZ,1,…pZ,q−1,tZ,q−1q−1∑d=0∑i∈Sdwdg(pX,tX,pZ,d,tZ,d) (23)

The problem in Equation 23 can be applied to the first class of cost functions, as well as the method in the second class of methods. For the method in the second class of cost functions, we also estimate intrinsic camera calibration parameters for all cameras and doing so follows from Equation 23.

## 3 Performance Evaluation on real datasets

The behavior of the robot-world hand-eye calibration methods is demonstrated on eight datasets in real laboratory and field settings.111All of the datasets are available from tabb2017solving_dataset . These datasets represent different combinations of robots, cameras, and lenses; some of the datasets have multiple cameras. Descriptions of the datasets are given in Table 1. The table also lists the number of robot positions () used for the calibration and the error (in pixels) from the camera calibration step using Zhang’s method zhang2000flexible to estimate the matrices . For datasets 7 and 8, the third camera is a commodity RGB-D (Red-Green-Blue-Depth) camera; we calibrated only the color camera in this work. Figure 6 shows the arrangement of cameras in dataset 7 as well as sample images from one position of the robot. Figure 9 shows both of the robots used in this work; two 6-axis robot arms were used, a Denso VS-6577GM-B robot arm rigidly mounted to the floor and a Denso VM-60BIG robot arm mounted on a mobile platform.

In the rest of this section, we discuss some of the implementation details of our collection of proposed methods, and then discuss the error metrics used in reporting the performance results of the comparison of the proposed methods.

### 3.1 Implementation details

For the two proposed classes of cost functions, we use the implementation of the Levenberg-Marquardt method found in the software package Ceres ceressolver . All of the results shown in this paper were generated on a workstation with one 12-core processor and 192 GB of RAM. The camera calibration was carried out using the OpenCV library’s camera calibration functions opencv_library .

For the first class of cost functions using simultaneous and separable versions of and , the initial solutions of and are set such that the corresponding directed cosine matrix is a identity matrix, and the translation components are set to zero.222Various different initial solutions were tested, and there was small or negligible difference in the solution quality versus using an identity matrix for rotation matrices and translation component with all elements zero. For this reason, we conclude that for the experiments covered in this paper, the first class of methods is not sensitive to initial solutions. As mentioned previously, the second class of cost functions and are sensitive to initial solutions. To find an approximate solution to the minimization problem of method that was shown in Equation 16, we used the simultaneous solution using the cost function (problem specified in Equation 10) as an initial solution. To find an approximate solution to the problem of the method shown in Equation 17, the solution from Equation 16 was used as the initial solution.

To present a comparative study against our collection of cost functions and methods, we implemented seven comparison methods: Zhuang et al. Zhuang1994simultaneous , Dornaika and Horaud dornaika1998simultaneous : closed form and iterative, Hirsh et al. hirsh2001iterative and Li et al. li2010simultaneous based on the dual quaternion and Kronecker product, and the work of Shah Shah2013Solving . Concerning the implementation details of Li et al. li2010simultaneous methods, we draw the reader’s attention that the computation of coefficients and in the dual quaternion method was not specified clearly by the authors, thus we used a Levenberg-Marquardt method to find those values. For the conversion of rotation matrices found by the Kronecker product method li2010simultaneous into orthonormal matrices, we used the singular value decomposition method.

### 3.2 Error metrics

In presenting the comparison results of both of our collection of methods and those methods of others, we used the following error metrics: two types of mean rotation error, the mean translation error, the mean combined rotation and translation error, and the reprojection root mean squared error. Since we are interested in using our methods for reconstruction and vision tasks as indicated in Subsection 1.2, we also introduced a sixth metric related to reconstruction accuracy. The following subsections describe each error type.

#### 3.2.1 The Mean Rotation Error

We list two rotation errors. The first, which we denote Rotation Error 1 (), is derived from Equation 2 and shown in Equation 24.

 eR1=1nn−1∑i=0||RAiRX−RZRBi||2F (24)

However, the value may not be particularly meaningful when evaluating the methods, and has no units. For this reason, we have a second rotation error, , that is representative of the difference between the left hand and right hand sides of Equation 2. We find the relative rotation between the two sides (Equation 25) and then compute the angle of the relative rotation using the axis-angle representation, which is represented as in Equation 26; in the results shown in Section 5, its units are degrees.

 Ri=(RZRBi)T(RAiRX) (25) eR2=1nn−1∑i=0angle(Ri) (26)

#### 3.2.2 The Mean Translation Error

The translation error is derived from Equation 3 and shown in Equation 27, where the units are in millimeters to be consistent with our selection of units for camera calibration error.

 et=1n∑i||(RAitX+tAi)−(RZitB+tZi)||2 (27)

#### 3.2.3 The Mean Combined Rotation and Translation Error

This error is shown in Equation 28 and has no units.

 eC=1n∑i||AiX−ZBi||2F (28)

#### 3.2.4 The Reprojection Root Mean Squared Error (rrmse)

This error is shown in Equation 29 and its units are pixels.

 rrmse= ⎷1mnn−1∑i=0m−1∑j=0||→xij−f(k,[ZBiX−1]3×4→Xj)||2 (29)

#### 3.2.5 The Reconstruction Accuracy Error

Since we are also interested in reconstruction accuracy when using a robot to acquire images with associated camera calibration information, our final metric aims to represent the reconstruction accuracy.

The idea behind this metric is given correspondences from different images acquired at different robot positions, estimate the world points that generated those correspondences, and compute the difference between the estimated versus true world points to represent reconstruction accuracy. We borrow some of the notation from Section 2.2: is the th image point from the th robot position and corresponds to the three-dimensional point . First, we estimate the most likely three-dimensional point (represented by ) that generated the image points by solving the minimization problem in Equation 30.

 (30)

Then, the reconstruction accuracy error () is the average Euclidean distance between the estimated points and calibration object points :

 rae=1mm−1∑j=0||^Yj−→Xj||2 (31)

## 4 Performance Evaluation on simulated datasets

The comparison methods as well as a subset of our proposed methods are evaluated on simulated datasets. For the simulated datasets contained in this paper, only , , , and are generated. As a result, we do not evaluate our proposed second class of methods, since doing so would also require simulating camera models. The protocol for generating the simulated datasets closely follows that of Shah 2013 Shah2013Solving , Section 5.1. Briefly, the rotation components of , , and

are set as random rotation matrices, and the translation components are drawn from the standard uniform distribution

using a random number generator, for . Then, the homogeneous matrix is computed as:

 Bi=Z−1AiX (32)

Noise is then introduced to , only in the rotation component, by converting the rotation matrix to the quaternion representation and adding random noise, and then converting back to the matrix representation. is the parameter controlling the magnitude of the random noise added to . and is evenly spaced over 19 values along that interval. Data was generated for ten trials of the experiment.

The protocol above sets the translation components within the interval of , which reflects a bias towards a particular use of units. Depending on the particular experiment setup and choices made, this interval may or may not be representative of that experimental setup. Consequently, we denote the simulated datasets using Shah’s protocol Shah2013Solving as Simulated Dataset I. We generated another dataset using the same protocol, except that the translation components were within the interval of , and refer to this as Simulated Dataset II. We chose the interval since in our real experiments we use millimeters for translation components to be consistent with camera calibration; of course, the use of meters would be another choice and this choice is reflected in Simulated Dataset I.

### 4.1 Error metrics for simulated datasets

Since the ground truth and are known for the simulated datasets, for each calibration method the difference between the rotation and translation components are computed. Again, we follow closely Shah’s presentation Shah2013Solving in our description of the error metrics for the simulated experiments.

#### 4.1.1 Rotation error

Let be the rotation matrix estimated by a calibration method for a simulated dataset, and be the ground truth rotation matrix for . Then, the error in the rotation component is :

 eRX=||^RX−RX||F (33)

and the same formula follows to compute the rotation error for , .

#### 4.1.2 Translation error

Let be the translation vector estimated by a calibration method for a simulated dataset, and be the ground truth translation vector for . Then, the error in the translation component is :

 etX=||^tX−tX|| (34)

and the same formula follows to compute the translation error for , .

## 5 Experimental Results

In this section, we first show and discuss comparison results between our methods and the seven methods we referred to in the previous section on real datasets. Then, we show and discuss comparison results using the two simulated datasets.

Tables 2 through 9 show the results using the five error metrics described in Subsection 3.2 and the eight real datasets described in Table 1.333While use of tables is perhaps not the easiest for the reader, we note that the huge range of values for each of the six error metrics made other types of representation (graphs, etc.) infeasible. These comparison results also include the running time for our implementation of each method in seconds.

In Tables 26, the first seven rows correspond to the seven comparison methods from the literature, listed in chronological order: Zhuang et al. Zhuang1994simultaneous , Dornaika and Horaud dornaika1998simultaneous : closed form and iterative, Hirsh et al. hirsh2001iterative and Li et al. li2010simultaneous based on the dual quaternion and Kronecker product, and the method of Shah Shah2013Solving , whereas the remainder of the rows correspond to our proposed methods.444Only the first author is mentioned in the tables’ text for brevity. The comparison methods were not evaluated on Datasets 6, 7, and 8 and are not shown in Tables 7 - 9, because those datasets have more than one camera, and to the best of our knowledge, there are currently no other robot-world hand-eye calibration methods for multiple cameras.

Results for Simulated Datasets I and II are shown graphically in Figures 10 - 13. The seven comparison methods were evaluated as well as the first class from our proposed collection of methods. Within the first class of our proposed collection of methods, the results were very similar for different choices of rotation parameterization. Consequently, we only listed one rotation parameterization choice, that of Euler angles, to allow better readability.

### 5.1 Discussion of comparison methods on real datasets

We mention here that the Dornaika and Horaud iterative method dornaika1998simultaneous does not converge for any datasets using the Ceres solver ceressolver , where derivatives are automatically computed by the software. In comparison, in our previous work Tabb2015Parameterizations , where we used the levmar solver lourakis04LM , the Dornaika and Horaud iterative method converged after only 2-3 iterations after termination conditions were met, however, the results were very similar to that produced with the authors’ closed form method (which served as the iterative method’s initial solution). In summary, the behavior of different solvers may give different results, but the main conclusion we have made about this method is that the large penalty terms used to enforce orthonormality of the rotation matrices result in either very little change from the initial solution, or non-convergent behavior.

Another similarity among the comparison methods is the Hirsh et al. hirsh2001iterative and Shah Shah2013Solving methods. Both of these methods consistently produce the lowest values of rotation errors and that are usually the same up to 5 digits of precision. In contrast, the Dornaika and Horaud closed-form method dornaika1998simultaneous usually has one of the highest rotation errors. With regard to the mean translation error () and the combined rotation and translation error (), Shah’s method in general has a good performance relative to other comparisons methods as it results in one of the lowest values among the datasets until Dataset 5 (Table 6). With respect to the reprojection mean square error (), the Hirsh et al. hirsh2001iterative and Shah Shah2013Solving methods both perform generally the best, with Shah’s method resulting in lower in general as compared to the method of Hirsh et al., though this relationship is inverted in Datasets 1 and 5. For example, the method of Hirsh et al. has the lowest in Datasets 1 ( pixels), but large in Dataset 5 ( pixels), though still lower compared to Shah’s method’s value ( pixels).

The first conclusion drawn from Tables 26, which correspond to datasets with a single camera, is that the behavior of many methods is dataset dependent. This dataset dependency may result from many factors, including the positions chosen for calibration, the robot model, and the way in which the robot is mounted, which may introduce new errors. Recall from Section 3 that the DENSO VM-60BIG from Datasets 3 and 4 is mounted on a mobile unit with tires, so as the robot moves, there might be some movement in the robot base.

The dataset dependency of the comparison methods is illustrated with a selection of examples. For instance, the combined rotation and translation error () of Zhuang et al. Zhuang1994simultaneous is the lowest of the group of comparison methods in Dataset 1 (), in the middle range in Dataset 2, and greater than in Datasets 3, 4, and 5. The Kronecker product method from Li et al. li2010simultaneous performed well relative to the comparison methods for Dataset 5 (Table 6), where the combined rotation and translation error was and the other methods had combined rotation and translation error . However, for no other single dataset that has one camera does the Li et al. Kronecker product method have the lowest combined rotation and translation error relative to the other comparison methods.

### 5.2 Discussion of the proposed collection of methods on real datasets

#### 5.2.1 First class of cost functions

Within the first class of cost functions, the methods based on minimizing ( simultaneous and separable under different choices of parameterization of the rotational components) tend to have a lower value of combined rotation and translation error (), while methods based on minimizing have lower values of as compared to methods. Exceptions are Datasets 7 and 8, which have very similar values of and for and separable versions (see Tables 8 and 9).

Specifically, within the first class of cost functions, the separable versions of the methods tended to have lower rotation errors , , but higher translation errors and combined rotation and translation error , as well as reprojection error . This is particularly obvious in Dataset 5 (Table 6), where the reprojection root mean square errors of the separable versions are all greater than pixels, while the simultaneous versions have a maximum of pixels. The separable methods have a much greater than the simultaneous versions in particular in Dataset 6, Table 7.

#### 5.2.2 Second class of cost functions

As expected when using reprojection error as a cost function versus the relationship (and its variations), the second class of methods consistently gives results with lower values of than the first class of methods or the comparison methods. On the other hand, the rotation errors, translation errors and combined rotation and translation errors, when using the second class of methods, are greater than the first class of methods as well as some of the comparison methods. In general, the change in using (where intrinsic camera calibration parameters are treated as constants) versus (where intrinsic camera calibration values are parameters) is small.

#### 5.2.3 Effect of rotation representation

It seems that the choice of rotation representation – Euler angles, axis-angle, or quaternion – has only a small effect on the results, and those differences may be dependent on the particular solver used. For instance, in our prior work Tabb2015Parameterizations , the simultaneous version of the cost function was used based on the Euler angles parameterization (in that work, labeled as Euler Parameterization I) using the levmar solver lourakis04LM and a provided Jacobian matrix. In that work, when tested on Dataset 1, the was 3.62709 pixels. Whereas using the Ceres solver ceressolver with automatic Jacobian matrices for the same dataset, the values were 12.13, 3.62686 and 3.62641 pixels, respectively for the Euler angles, axis-angle and quaternion versions (Table 2). This difference in occurs even in the cases when the combined rotation and translation error () between the three rotation representations, using Ceres, differs by only . Usually, the differences between the rotation representations on all of the metrics are relatively small, though exceptions are Datasets 5 and 6 (Tables 6 and 7), the separable and methods.

### 5.3 Reconstruction accuracy performance for the proposed collection of methods and comparison methods

While reprojection root mean square error is a good indicator of relative reconstruction accuracy error (), there is not a monotonic relationship of to . We observed that the version from the second class in our collections returns slightly lower values of than . Concerning reconstruction accuracy error of the method versus the method, increases slightly in datasets with better selections of robot positions (Datasets 1, 2, 3, 4, 7, 8) using and decreases slightly in datasets with poorer selections of robot positions (Datasets 5 and 6) when using method as compared to the method. Concerning our first class of cost functions ( and ), we observed that using the simultaneous method always gives a lower . In particular, the simultaneous version in Dataset 2 has mm for all three rotation representations, and this pattern ( mm) is repeated in the separable versions of as compared to ( mm, mm for separable and simultaneous, respectively). Additionally, it is observed that the separable versions of both and perform well in some datasets with respect to and worse in others ( perform well in Dataset 2 ( mm), and perform poorly in Dataset 5 ( mm)).

Within the group of comparison methods, the methods of Hirsh et al. hirsh2001iterative and Shah Shah2013Solving consistently have the lowest values of rotation error, and both of these methods perform generally well with respect to . In general, Shah’s computation of translation vectors results in lower translation error () and as compared to the method of Hirsh et al., though this relationship is inverted in Dataset 5. However, none of the comparison methods performs as well as our proposed and methods with respect to reconstruction accuracy error, though for some datasets the methods of Hirsh et al. and Shah are comparable to the of our first class of proposed methods.

### 5.4 Discussion of the simulated experiments

The examination of the Simulated Dataset I and II results offer some additional insights into the behavior of the comparison and proposed methods, since the ground truth and are known. The Simulated Dataset I results for the rotation error for and , show that the Shah and Li et al. Kronecker product methods consistently producing the rotation matrices closest to the ground truth, while the rest of the methods have very similar error values ranging from and . The Shah and Li et al. Kronecker product methods also have the lowest translation error for and in Simulated Dataset I, while the rest of the methods have similar error values. One exception is the Li et al. dual quaternion method, which has the largest error than the rest of the methods for the translation components of and .

Simulated Dataset II is different from Simulated Dataset I in the scaling of the translation components, and this difference affects the performance of the methods. Concerning rotation error, as before the Shah and Li et al. Kronecker product methods produce the lowest values, but in Simulated Dataset II the simultaneous versions from our first class of proposed methods also produce comparable values. For translation error, the Li et al. Kronecker product and simultaneous versions from our first class of proposed methods produce very low error values (), while the remainder have values which vary considerably.

From these experiments, it is clear that some methods are sensitive to the scaling and distribution of the translation components, and a different method for dataset generation may yield different observations. The Li et al. Kronecker product method seems to be a robust choice in both of these settings, the Shah method for the Simulated Dataset I setting, and our first class of proposed methods for the Simulated Dataset II setting.

### 5.5 Recommendations and Summary

When choosing a robot-world, hand-eye calibration method to use, we hope that our experiments can aid researchers choose the method best suited to a particular task or application. For instance, the methods of Hirsh et al. hirsh2001iterative and Shah Shah2013Solving , and the and separable versions from the first class of our proposed methods consistently give the lowest rotation errors, whereas the simultaneous method returns the lowest combined rotation and translation error . For the best and without using camera reprojection error in the cost function, the simultaneous method is the best choice. When using camera reprojection error as the cost function, is generally better on with a higher quality camera calibration and is better than with a poorer quality intrinsic camera calibration. Within the collection of methods we propose, the choice of rotation representation does not greatly affect the results when using a modern solver such as Ceres ceressolver .