The problem of estimating the pose of a camera (or dually of a 3D object) from a set of 2D projections in a single view has been widely studied in the computer vision literature for a long time[Szeliski2011]
. The “minimal” problem i.e., which requires the minimal amount of information necessary, is known as the perspective-3-point-problem (P3P) and consists in recovering the pose of a calibrated camera from three 3D-2D point correspondences. Many solutions are available for the general case when more information is available. When the environment in the scene can be controlled, artificial features with known positions are very often deployed in the scene. They are used in a wide range of applications, especially when a reliable reference is needed to, e.g., in cluttered or textureless environments. The most popular artificial features are probably coplanar features[Fiala2010] whose layout in 2D space defines a so-called planar marker. The mapping between a planar marker and its image is a 2D projective transformation known as (world-to-image) homography and can be estimated from at least four world-to-image correspondences (the most simple planar marker is a square). Once the camera is calibrated, the decomposition of the homography matrix allows to recover the pose of the camera (or dually that of the plane).
Other well-known artificial markers that have been recently investigated again are those consisting of coplanar circles [Wu2004, Gurdjos2006, Kim2005, Bergamasco2011]. The knowledge of at least two circle images (without any information on their parameters on the support plane) allows to compute a world-to-image homography without ambiguity for all the spatial configurations of two circles except one [Gurdjos2006].
Given a single circle image, it is well-known that a twofold solution exists for the normal to the support plane (and so for the pose) only if the camera is calibrated [Wu2004]. In this work, our contribution is to put beyond this limit by dealing with the case of an uncalibrated camera seeing one circle. Actually, our starting point came from the surprising observation, learned from empirical works, that very approximate calibration can lead to accurate circle pose estimation. Our idea is to use default intrinsics by designing a generic camera model delivering a default focal length based on off-line calibration of several smartphone cameras.
Our first contribution is to run extensive experiments that assess how the inaccuracy of the calibration impacts the quality of the pose estimation. We found out that exact calibration may not be required as small variations on the focal length does not affect the reprojection error of other reference coplanar points especially when the marker is far from the camera. Our second contribution is to provide a new geometrical framework to state the pose problem in which the issue of how to remove the twofold ambiguity can be thoroughly investigated.
We review the related works in section II. Then in section III, we remind the problem of recovering the pose from the projection of a circle before introducing the solution proposed in section IV. The idea is to introduce a new way of computing the vanishing line (dual to the plane normal) from one circle image. Thanks to it, as the general method leads to two possible solutions, we show how under some assumptions about the geometric configuration we can recover the correct one. Then, as we suppose that we work with uncalibrated images, we explain how we select parameter values to obtain what we called default camera intrinsic parameters. Finally in section V, we evaluate our method in the context of augmented reality.
Ii Related Work
A lot of existing works suggest to use a set of features encoded in a planar pattern to simplify the pose estimation. Fiala et al. introduced a fiducial system [Fiala2010] and proposed the case of a special planar square marker. Recent efficient algorithms allow to detect ellipses precisely and, in consequence, circles become features worth of interest. The four projective and affine parameters of the world-to-image homography (the remaining four define a similarity on the world plane [RichardHartley2003, p42]) can be recovered by detecting the images of two special point of the support plane, known as circular points (e.g., see [RichardHartley2003, p52-53]) which are common to all circles. Gurdjos et al. [Gurdjos2006] relied on the notion of pencil of circle images to formulate the problem of detecting the images of the circular points as a problem of intersection of lines, obtained from the degenerate members of the pencil. Kim et al. [Kim2005] proposed algebraic and geometric solutions in the case of concentric circles. Calvet et al. [Calvet2016] described a whole fiducial system using concentric circles which allows to accurately detect the position of the image of the circles’ common center under highly challenging conditions. In a same vein, Huang et al. [Huang2015] proposed to use the common self-polar triangle of concentric circles.
When using circular markers it is also possible to simplify the model of the camera to only depend on a sole focal length parameter. Chen et al. [Chen2004a] autocalibrate the focal length using two or more coplanar circles. The problem to solve contains two parameters: one ellipse and the focal length. Then, two correspondences between a circle and an ellipse are necessary to estimate the focal length. Based on the same method, Bergamasco et al.[Bergamasco2011] designed a marker composed of small circles spread on the edge of two or more concentric rings. The image of each circle is used with a vote system to estimate the focal length and the image of the external rings.
Two circles on a planar marker (except if one encloses the other) is the minimum to fully estimate the homography without any other assumptions. However in some applications e.g., dealing with concentric circles, detecting the images of two or more circles can be tricky. First because the lack of points induces an inaccurate estimation and, secondly because it is time consuming. When the camera has already been calibrated, it is possible to compute the homography from one circle image with two ambiguities. Pagani et al. [Pagani2011] introduced a method quite similar to the solution proposed by Chen et al. [Chen2004a], where the ambiguity is solved by minimizing a distance between the image of the marker rectified and the expected pattern on all possible poses.
Iii Pose estimation from the image of one circle
We remind here some geometrical background on the problem of pose estimation from the image of a single circle. We consider an Euclidean projective camera, represented by a , where the rotation matrix 111 refers to the 3D rotation group.
and the translation vectordescribe the pose of the camera, i.e., respectively its orientation and position in the object 3D frame. The upper triangular order- matrix is the calibration matrix as defined in [RichardHartley2003, page 157].
Assume that is a plane with equation in the world frame. The pose of in the camera frame is given by the vector , where , the third column of , defines the unit norm of , and is the orthogonal distance to . The restriction to of the projection mapping is an homography whose matrix writes , where and are the first two columns of . In the projective plane, any conic can be represented in 2D homogeneous coordinates by a real symmetric order- matrix. Under perspective projection, any circle of , assuming its quasi-affine invariance [RichardHartley2003, p515] i.e., that all its points lie in front of the camera, is mapped under the homography to an ellipse by the projection equation , where 222 refers to the space of order-3 real symmetric matrices. is the circle matrix and is the ellipse matrix.
For reasons that will become clearer later, we want to parameterize the homography , from only the knowledge of the circle image and the vanishing line of . Let be a similarity on the world plane that puts the circle into a unit circle centered at the origin and be a similarity on the image plane that puts into a canonical diagonal form . Using an approach similar to [Calvet2016] with the notation , it can be shown that, under the assumption of a camera with square pixels, we have where
Note that the matrices and can be completely determined by the circle image and , except for an unknown 2D rotation around the circle centre on . Recovering this rotation is not the goal of this paper. Some simple solution like placing a visible mark on the edge of the marker works generally well in many cases.
Our main task will be to recover the vanishing line of the plane, as explained in the sequel. Note that the vector defined above is that of the image of the circle centre which is the pole of w.r.t. the dual ellipse of .
Iv Support plane’s vanishing line estimation
We warn the reader that parts written in italics in this section requires a proof that is not provided due to lack of space. However all proofs will appear in an extended paper version.
Iv-a A twofold solution in the calibrated case
In the case of calibrated image, an equivalent problem of computing the pose of its support plane is that of recovering the vanishing line of . Let be the matrix of a circle on a plane , and be that of the back-projection onto of the image of the absolute conic [RichardHartley2003, p. 81], where . It is easy to show that represents also a virtual333Virtual conics have positive definite matrices, so, no real points on them. circle (as does ).
denotes the set of generalized eigenvalues of the matrix-pair, i.e., the three roots of the characteristic equation . The set of matrices defines a conic pencil [Gurdjos2006] which includes three degenerate conics with matrices . These rank-2 matrices represent line-pairs and have the form , where and are vectors of these lines. Such line-pair matrix can be easily decomposed and vectors of its lines recovered albeit it is impossible to distinguish from . It can be shown that the projective signatures444The signature of a conic is , where and count the positive and negative eigenvalues of its (real) matrix . It is left unchanged by projective transformations. of the three degenerate members always are , and . Assume, without loss of generality, that the degenerate conic is the one with signature . A first key result is that is a pair of two distinct real lines, one of which being the line at infinity ; the other one being denoted by . The other two degenerate conics and —with signatures and — are pairs of two conjugate complex lines. Consequently, the three (so-called) base points
, where lines in a pair meet, are real. Moreover, their vectors are the generalized eigenvectors ofand satisfy .
Similarly, in the image plane, if denotes the image of the circle , the set of matrices defines also a conic pencil whose members are the images of the pencil . Hence, the line-pair in that includes the image of i.e., the vanishing line , can always be identified since it is the only degenerate member with signature . Nevertheless, at this step, it is impossible to distinguish from the other line , image of .
Assume that all matrices , , and are normalized to have a unit determinant. It is known that, in this case, parameters in pencils satisfy , so, the generalized eigenvalues of the matrix-pair are exactly the same as those of . It can be shown that these eigenvalues can always be sorted such that , where is the (sole) degenerate conic with signature . Remind that is the conic which contains plus , which are two a priori indistinguishable lines denoted by . Because the matrix is real, symmetric, rank- and order-, its generalized eigen-decomposition using the base point vectors writes as following:
from which it can be shown that
The two solutions to the normal to are given by in the camera frame, and (3) explains the known doublefold ambiguity in the plane pose [Chen2004a].
Iv-B About removing the twofold ambiguity
We have seen that there are two solutions for the vanishing line (or the plane normal in the calibrated case) which are in general not distinguishable. In this section, we discuss whether known configurations allows the ambiguity to be removed. We extend the new theoretical framework proposed in §IV-A that involves the point (on the support plane ) where the optical axis cuts plus the line obtained by intersecting and the principal plane555The 3D plane through the camera centre and parallel to the image plane. of the camera ( is orthogonal to the orthogonal projection of the optical axis onto ). Now, let denote the line parallel to through the circle centre. Within this geometrical framework, we can claim, for instance, that a sufficient condition for the ambiguity to be solved is given by the two following conditions:
and the orthogonal projection on of the camera centre lie on the same side of ;
the point, intersection of the orthogonal projection on of the optical axis and , lies outside the circle centered at with same radius as .
Figure 1 illustrates this important result. We are convinced that future investigations using this framework can help to reveal more configurations in which the ambiguity can be removed. We are now giving more geometrical insights indciating how to determine such configurations, via three propositions. The first is the second key result which is the building brick of our approach:
Proposition 1 (second key result)
The line in separates the two base points and . Hence, denoting by the normalized vector , the following inequalities hold and .
These two inequalities hold under any affine transformation but not under a general projective transformation.
How the conditions in proposition 1 can be helpful in removing the plane pose ambiguity? Can we state a corollary saying that, in the image plane, under some known geometric configuration, we know which the line in , image of , always separates points and , images of base points and , while the other does not? That is, if we a priori know can we guarantee that ? If yes, since the vectors of these base points are the generalized eigenvectors of associated to parameters , and can be straightforwardly computed, we could remove the ambiguity by choosing as vanishing line the “correct” line in . We claim the following proposition for this corollary to hold, whose (omitted) proof directly follows from the properties of quasi-affineness w.r.t. the base points [RichardHartley2003].
When and lie either both in front or both behind the camera i.e., on the same half-plane bounded by , we have and . Otherwise and .
Now let us investigate a formal condition saying when and lie on the same half-plane bounded by . Consider an Euclidean representation of the projective world in which the origin is the point at which the optical axis cuts the plane . Let the -axis be parallel to the line and the -axis is the orthogonal projection of the optical axis onto . Consequently, the -axis is directed by the normal to , as shown in figure 1. Let , , be the 3D cartesian coordinates of the camera centre, where is the angle between the -axis and the optical axis in the -plane (note that we choose the scale such that the camera centre is at distance from the origin). Therefore the direction of the optical axis is given by .
In the 2D representation of the projective plane (i.e., of the -plane), let the circle have centre and radius . Let is the vector of line . It can be shown, using a symbolic software like maple666https://fr.maplesoft.com/, that:
Base points and lie, in the world plane, on the same side of if and only if
Since , if and then and lie on opposite sides of . The former inequality says that must lie on the same side of , the line parallel to through the circle centre, as the orthogonal projection of the camera centre onto . The latter inequality says the point must lie outside the circle centered at with same radius as . As we are in the “otherwise” part of proposition 2, the vanishing line is given by the line that does not separate the image of the base points. Since represents the intersection of the orthogonal projection on of the optical axis and , this is the result announced at the beginning of this section.
Iv-C Defining default intrinsics for the camera
In the previous sections we have seen that, providing that the camera intrinsics are known, there is a twofold solution for the vanishing line. Recovering accurate intrinsics of a camera requires generally a calibration procedure. In many applications, the model of the camera can be simplified to reduce the number of parameters. A very common model is that of a camera with square pixels and principal point at the centre of the image plane. Consequently, the focal length is the sole unknown, e.g., for self-calibration purposes [Pollefeys1999]. The focal length value is sometimes available through EXIF data, stored in digital images or video files, through camera hardware on top level API (Android, iOS) or through data provided by manufacturer on websites. Focal length, denoted , in pixels (what we need) can be obtained from this data if we find the field of view in angle or the focal length equivalent in mm. However the focal length is very often given in millimetre without the sensor size required to obtain the focal length in pixels.
We consider here the case where it is impossible to calibrate the camera by none of the methods mentioned above. So how to do? We propose to design a generic camera model delivering default intrinsics (i.e., focal length) and based on off-line calibration of several smartphone cameras. If a camera can generally take any focal length value, the optics and the sensor of smartphones are constrained by the device size and the desired field of view. Why doing that? We found out that surprisingly enough, that it is not necessary to have very accurate intrinsics to estimate the vanishing line given the image of a single circle. In fact, as shown in the experimental section V, this estimation is very robust to intrinsics fluctuation.
After calibrating a dozen of camera devices and obtaining data from manufacturers of twenty more smartphones, we estimate a gaussian model of the focal length equivalent in 35mm, as shown in figure 2.
In our case we obtained experimentally an average focal length of
with a variance of. More precisely, we estimate a gaussian function (in blue) based of the focal values collected or estimated (in red) from different smartphone device brands.
V Experimental results
V-a Test Description
The goal of the test presented in this section is to evaluate the proposed method to estimate the pose of a camera. We performed those tests on synthetic and real images in the conditions illustrated in figure 3.
In order to limit the poses used in experiments, we made some hypotheses. First, we suppose that the camera focus on the centre of the marker, i.e. the principal axis of the camera passes through the centre of the marker, see figure 3. Then, the angle has been set to zero. In deed, we can simulate any angle by rotating the image using the assumption that the principal point is centred on the image and that the principal axis is orthogonal to the image plane. Finally, the angle has been fixed to zero as estimating the 2D rotation around the plane normal is out of the scope of this article. The remaining variables whose variations are studied in our test are the angle and the distance .
We know that introducing generic camera parameter, as proposed in section IV-C, should have a negative impact on the accuracy of the pose estimation. Consequently, one of the objectives of this experiment is to evaluate the sensitivity of the proposed method to inaccurate camera focal parameter. The observation of the distribution of focal length of various smartphone camera, see figure 2, reveals that all 35mm focal equivalent are included in of the average value. So, five different values that span this range are used in the experiment: . In order to generate synthetic images, we have simulated a synthetic camera of focal and resolution of pixels. To obtain real images, we have used the camera of a smartphone which have been calibrated with openCV library777https://opencv.org/. In both cases, we suppose that ellipses have been firstly detected in the images, i.e. contour points are first detected and then ellipses are estimated [Szpak2015]. We try to evaluate the impact of errors of this estimation to the quality of the results. In consequences, in our synthetic tests, we have also simulated noises on the detections of the ellipses, i.e. errors on the pixels that belong to the ellipse. More precisely, edge points of the ellipse have been translated with a zero mean gaussian variance of .
Finally, we evaluate the quality of the results obtained by using three different measurements relative to the pose and the reprojection accuracy:
Error on the normal of the plane relative to the camera;
Error on the position of the marker;
Error of reprojection of 3D points close to the marker.
Each curve illustrates the results obtained by applying a modifier on focal length used for pose estimation. The resulting errors are displayed as function of the distance in the interval where is the diameter of the marker. This interval is related to the distances used for being able to detect and to recognize a marker for an augmented reality application, i.e. the distance where the marker occupies, at least 80 pixels. We also show results for three different angle values, , displayed in three sub-figures.
V-B Analysis of the results
Results on synthetic images are presented in figure 4. In (a)a, we show the error on the estimation of the orientation for the pose. We can notice that as the distance of the marker to the camera increases, the error on pose orientation also increases. This relation is even more remarkable when the angle is the lowest between the marker plane and the camera, i.e. the graph on the left. In (b)b, we can see that in the calibrated case the accuracy in position stays low and does not depend on the distance to the camera and the angles between the marker plane and the camera. In the uncalibrated cases, as expected the detection of the ellipses becomes less accurate when the distance increases and, consequently, the quality of the estimation of the marker position is also affected. In fact, the error in position increases linearly when the distance increases. This observation is quite intuitive. In deed observing a marker with a zoom or taking its image closer leads to very similar shape of the marker. The error on the reprojection of 3D points, presented in (c)c, illustrates that, with a focal length well estimated, the higher the distance, the higher the errors. Whereas, when the focal length is not well estimated, the higher the distance, the lower the error and, more important, this error is quite near the error obtained when the focal length is correctly estimated. It means that using generic parameter is not affecting the quality of the reprojection in a context where the marker is far from the camera.
The figure 5 allows us to present similar conclusions on real images. The 3D point reprojection error is presented. The error in calibrated case slightly increases with the distance as observed in figure (a)a. When the marker is close to the camera, the error of reprojection when the camera is not correctly calibrated is high but it drastically decreases when the distance to the camera increases, and, finally, this error is of the same order as that obtained with the calibrated case. This observation is not really a surprise as the projection of a distant object loses its perspective with distance. Again, this result illustrates the interest of using generic camera parameter in augmented reality.
In this paper, we introduced a method to estimate the pose of a camera from the image of a circular marker in a calibrated case. If, in general case, two solutions are found, some assumptions on geometric configuration can help to distinguish the correct pose. Moreover, we demonstrated the interest of using default camera parameters, in the context of augmented reality. In particular, the results presented showed that, in a case of a distant marker, the 3D reprojection errors is low enough. Future work would be to use more information in the marker environment to increase the stability of the detection of the marker and the pose estimation and to allow decoding from longer distance.