Multi-view geometry (mvg) is a Computer Vision (CV) subfield that attempts to understand the structure of the 3D world given a collection of its images(hartley2003multiple). As the binocular human vision is naturally 3D, the same underlying principles allow the recovery of the 3D world structure in mvg reconstruction methods. However, a prerequisite is to have calibrated cameras, an assumption that is violated in unordered image sets. In this paper we focus on self-calibration and multi-view reconstruction using relations between camera pairs.
Assuming a camera pair with unknown and different focal lengths as the only unknown internal parameters, a standard approach to self-calibration and metric reconstruction first applies the 7 point algorithm (hartley2003multiple) inside a RANSAC (ransacalgcit) procedure to find the fundamental matrix. In this projective framework, the Kruppa Equations (krupparef) are used to determine the unknown focal lengths. Next, applying the 5 point algorithm (5palgcit) inside a RANSAC procedure, leads to a metric reconstruction. Since focal lengths are recovered in a projective framework, only epipolar geometry constraints may be used to check the solution plausibility. Solving self-calibration and metric reconstruction problems simultaneously permits the application of the more intuitive and restrictive geometric arguments of the metric framework.
Self-calibration methods are derived from relations on the dual absolute conic (DAC) and the dual image of the absolute conic (DIAC) (176; 195; krupparef). However, existing methods require three or four images to provide a solution (176; 195), use numerical methods to determine DAC (176), provide an initial DAC estimate that violates the rank-2 condition (176; 195) and do not examine the relations between the recovered putative solutions (176). In mvg reconstructions additional assumptions have been made to determine focal lengths, as availability of EXIF tags (sfmrot2; snavely2010bundler), equality of focal lengths across all images (martinec2007robust; stewenius2005minimal) and vanishing points correspondences (sinha2010multi).
Towards a multi-view reconstruction, camera pairs have been utilized. Different estimates for a rotation matrix can be combined with a rotation averaging algorithm (rotavealg2) and reconstructions of pairs of images can be combined with rotation registration methods (rotreg1; rotreg2; rotavealg1) to initialize an instance of Structure-from-Motion with known rotations (sfmrot1; sfmrot2) and produce a multiple-view reconstruction.
Erroneous solutions in mvg problems are directly caused by erroneous or noisy image correspondences. Two complementary approaches, applying RANSAC procedures to repeateadly sample minimal point sets and verifying the initial point coresspondences, have been utilised to improve the validity of the recovered solution (chum2012homography; snavely2006photo).
In this paper, we derive a linear method for the self-calibration and metric reconstruction of camera pairs with unknown and different focal lengths, unifying two problems that were previously solved independently, to a single system of equations. We further disambiguate the two solutions recovered by our method through the derivation of two theorems about the solutions’ relations. We improve the robustness and applicability of this method by introducing a procedure to verify tentative point correspondences between images, using the Longest Common/Increasing Subsequence (LCS/LIS) problem (fredman1975computing). The verification method is tailored for outdoors scenes of buildings, and is based on enforcing expected geometric properties of such scenes. We integrate our afforementioned methods to a multi-view reconstruction pipeline, utilizing -norm algorithms and introducing a method to average different estimates of a single focal length , which uses the structure of the problem, specifically that each estimate for comes from a pair of images and is so paired with a second estimate .
The rest of this article is organized as follows: Section 2 provides background on the reconstruction problem and the verification of image correspondences. Section 3 introduces our method for self-calibration and metric reconstruction. Section 4 presents our method for correspondences verification between the images, first Section 4.1 presents a method which is reduced to the LCS problem and then Section 4.2 presents the final practical algorithm. In Section 5, we integrate our methods to a reconstruction pipeline. In doing so, we develop novel averaging methods for estimates recovered from image pairs. Results for camera pair reconstruction, correspondences verification, focal length averaging and mv reconstruction are given in Section 6.
2 Background & Related Work
In the following bold font (e.g.
) is used for vectors and capital case normal font (e.g.) is reserved for matrices.
2.1 Elements of multiple view geometry
In this section we summarize basic notions about the projection of 3D scenes to 2D planes (hartley2003multiple; faugeras2004geometry). In a metric reconstruction parallel world lines converge at the plane at infinity : . The absolute conic is a conic on which satisfies , , where is the homogeneous representation of world points.
By taking all the planes tangent to , we construct , which is the dual surface of . is described in a metric reconstruction by the matrix
Now, considering projective reconstructions of 3-space and projections to image plane we have the following Results (hartley2003multiple):
The projection of by projection matrix in the image plane is the dual conic
If the 3-space is transformed by homography , that is , then planes of 3-space are transformed according to
If is a matrix representing a projective transformation of 3-space, then the fundamental matrices corresponding to the pairs of camera matrices , are the same.
Suppose the matrix can be decomposed in two different ways as
for some non-zero constant and 3-vector
Using the preceding Results, we formulate the equations to solve the camera self-calibration problem and to determine position in a projective reconstruction.
We summarize our notation in Table 1.
|M||Metric Reconstruction, e.g.|
|P||Projective Reconstruction, e.g.|
|1(or 2)||Refers to Camera 1(or 2) in camera pair, e.g|
|1(or 2)||Refers to solution 1(or 2) for the second camera, e.g|
|Accents, as in and||Discriminate between the 2 solutions for camera 2|
|P matrix representations|
|i-row vector of left P matrix block|
|Projective Reconstruction in canonical form|
|Appear in some Lemmas|
2.3 Verifying point correspondences
In standard approaches to find image correspondences regions of interest are described by local feature descriptors (lowe2004distinctive). Consequently, erroneous matches occur between similar in local appearance image regions.
A way to reject erroneous matches is using arguments about the geometry of the depicted scenes (geometric verification). In SIFT features, Hough transform was used to acquire the orientation of the detected features (lowe2004distinctive). Another common approach applies a rudimentary transform (e.g affine, similarity) between the images, to reject some correspondences before fitting the full model (turcot2009better; philbin2007object; perd2009efficient; chum2004enhancing). For these methods we mention that:
Most require specific image features to be extracted, from which special parameters are used to fit the transform
Result in rejection of a large number of correspondences
When we tried using a similarity transform to geometrically verify matches in the reconstruction problem, our results did not improve
Another direction is to improve the covariance of local feature descriptors (perd2009efficient; chum2012homography; chum2006geometric). The regions of interest can be first transformed before extracting a feature descriptor (perd2009efficient) or ellipses may be matched instead of points (chum2012homography).
Finally, the neighborhoods of putative matched points are examined in some verification methods. Such approaches include counting the number of correspondences between the neighborhoods of two tentative point matches (sivic2003video) or examining the order of matched features between the neighborhoods and counting the number of features out of order (wu2009bundling). The geometric verification method we propose uses properties concerning the order of matched points as well.
2.4 Approaches to multiple view reconstruction
In a reconstruction pipeline, initially Structure from Motion (SfM) is solved to get assuming image point correspondences and self-calibrated cameras. The fundamental method to solve SfM is Bundle Adjustment (BA) (BAalgcit), an iterative, numerical algorithm to minimize the reprojection error of the recovered solution.
In standard approaches to SfM a sequence of SfM sub-problems are solved (sequential SfM) (snavely2010bundler; snavely2006photo; wu2011multicore). In each iteration, more, possibly uncalibrated, cameras and world points are added to the SfM problem which is solved using BA. However such methods are sensitive to the initial camera pair selection, solve a large number of optimization problems numerically and optimize an objective function with possibly multiple local minima.
A different approach has been developed for solving the SfM with known Rotations problem within the framework of optimal algorithms in multiple-view geometry (mvg) and mvg algorithms (dalalyan2009l_1; hartley2007optimal; sfmrot1; sfmrot2; olsson2010generalized; zach2010practical). In this formulation, the camera rotation matrices
are given. SfM is formulated as a convex-optimization problem, for which a unique global minimum exists. For the actual solution of SfM with known rotations, either a sequence of Second-order cone programs are solved to arrive at an exact solution, or approximate solutions are recovered by solving SOCP or Linear programs(martinec2007robust; enqvist2011non; sfmrot2; sinha2010multi). BA may still be applied as a last fine-tuning of the solution.
A SfM solution, allows the reconstruction of a low number of 3D points (sparse point cloud), limited by the number of image points correspondences. Multi-view stereo (mvs) algorithms can be used at this point to produce a dense point cloud, which contains a much larger number of 3D points (furukawa2010accurate). Finally, surface reconstruction algorithms can be used to produce a 3D surface (kazhdan2006poisson).
3 A method for Metric Reconstruction in pairs of Uncalibrated Images
3.1 Formulation of System Equations
Let us consider two cameras and further that coordinate system is aligned with the world coordinate system. Let us further assume, that the corresponding image coordinate systems are selected so that the internal parameters of each camera can be written as
where is the focal length. The previous assumptions are routinely employed in multiple view geometry and are thoroughly discussed in the literature (hartley2003multiple).
We start from a projective reconstruction of the 2 cameras, given by , which is related to the metric reconstruction by a world (3D) homography as in
where is yet undetermined and the scale factor can be ignored ().
To fully determine , we turn to the plane at infinity
Using Result 2 we arrive at
Eq. (5) comprise a non-linear system with respect to the five unknowns (plane at infinity coordinates and focal lengths) we pursue to determine to acquire a metric reconstruction of the scene. We note that is symmetric by definition, and is also homogeneous, thus it provides five independent equations.
In Eq. (5), we substitute
We group the unknowns in the following complexes
The augmented matrix for the linear system
is then given by
We derived the above equations (in order of appearance) from elements , , , , , of . In the following, we use the first five equations as explained in Section 3.4.
The matrix of Eq. (8) is rank deficient. Thus, we presented a linear system of five (in the best case) linearly-independent equations, in six unknowns. To solve it, we turn to the polynomial relations between the coordinates of .
3.3 Recovering the solutions
Taking five of Eqs. (7) we have the linear system
Applying Gaussian elimination to (9), we bring the augmented matrix to the form
The elements in default font, are in the usual form expected when we apply Gaussian elimination in the general case
The elements in green font, are a result of the problem’s structure, that is of the special relations in Eq. (10)
Finally, the element in blue font, is as given when we use the canonical representation for the projective reconstruction, which is:
Where is the left null vector of , . By using the canonical pair, the leftmost block in is rank , and consequently has linearly-dependent row-vectors
The derivation of Eq. (10) is given in the Supplementary Material.
To solve for the focal lengths () and (), we now have from (10)
We substitute , from (16), and , from (15), to Eq. (14), and obtain a second-order equation with respect to . Thus, we determine uniquely and with a two-way ambiguity. We refer to those two solutions as
3.4 The effect of homogeneous representation on the derived equations
In homogeneous coordinate systems representations equal up to a multiplicative constant refer to the same entity. We explore here how this ambiguity affects the formulation of Eq. (7).
, be the ground truth camera matrices we aim to recover
, be the starting projective reconstruction
is related to by the homography
Thus, we get from to by homography , from to by and from to by .
For the camera pairs, we have the Fundamental matrices
From Result 3, since reconstructions are related by , the reconstructions share a common Fundamental matrix. Since Fundamental matrices are homogeneous entities, we have
Now, we turn to Result 3 and get
We write the previous equations in matrix form to get the projective transformation
Now, we get from
We set the bottom-right element to , as we disregard the true scale of the reconstruction, and get the final form of
We observe that the translation direction is correct but the left-most block of camera is multiplied by a constant .
To avoid the determination of additional unknowns in Eq. (7), we have
All equations derived from elements off the diagonal are of the form , thus the constant can be eliminated
The equation derived from element cannot be used without determining additional constants. So, we may only use the rest five of the six original equations of (7)
The complete method to solve the metric reconstruction and self calibration problem follows:
We solve the system (7), keeping five equations and discarding the equation derived from
In the previous step (1), we recovered . To fully determine , many different approaches are possible. We propose to repeat step 1, putting camera 2 at the origin of the coordinate system (in place of camera 1). This can be done by transposing the Fundamental matrix for the camera pair. Following this approach, we may additionally determine the constant
Using the homography of Eq. (4) or Eq. (19), we recover the metric reconstruction . Depending on the homography used, one camera matrix ( for Eq. (4) or for Eq. (19)) will have the left-most block multiplied by a constant. This has no effect on the correctness of the representation, and the image points are the same in each case
3.5 Solution disambiguation and geometric relations of the two solutions
We use the Cheirality condition (Corollary 1) to determine the valid solution of Eq. (7). Whenever the two recovered solutions represent cameras with divergent viewing directions, Cheirality condition is more likely to identify the valid solution. We explore in Theorems 1 and 2 the geometric relations between the two solutions, aiming to vizualize solutions’ relations and disambiguation. Proofs of Theorems 1 and 2 are outlined in Figs. 2 and 3. Full proofs are given in the Supplemental Material.
denote the reconstructions derived from (17). Then, cameras are in mirror positions with respect to the origin (position of ). The centers of projection satisfy
Let camera be positioned on the origin of the world coordinate system, with a viewing direction aligned to axis. We denote the viewing directions of and the position vectors of the corresponding centers of projection. Then, bisect the angles formed by , in the plane defined by . Thus, we have:
The correct one of solutions (17) can be identified by requiring all world points that are visible from camera to be in the space in front of camera .
4 Geometric verification of tentative image correspondences
4.1 Reduction to Longest Common Subsequence problem
The geometric property we pursued to enforce in tentative image correspondences is the order of imaged points with respect to the horizontal and vertical image directions. We have:
If a point, A, is imaged to the left of a point, B, in the first image, then A should be to the left of B in the second image as well. We call this property Consistency-x
Similarly, if a point, A, is below another point, B, in the first image, then A should be below B in the second image as well. We call this property Consistency-y
To see how we can arrive at the LCS/LIS problem we examine each one of the two Consistency properties independently. We present here the analysis concerning Consistency-x.
We start with a formal definition of Consistency-x. A set of correspondences , where is the x-coordinate of a point () in image , has Consistency-x, if for all points of image in :
All points in image that are in the Consistent-x set and are to the left of match in image with points that are to the left of :
All points in image that are in the Consistent-x set and are to the right of match in image with points that are to the right of :
We seek the most populous set of correspondences which is Consistent-x. We can reduce the Consistency-x problem to LIS in the following way: We sort points in image 1 with respect to the x-axis (). This sorting is a permutation in the sequence of correspondences. We apply this same permutation to and get a sequence from the ordinates of points . We seek the LIS of this last sequence.
The LCS/LIS problems are efficiently solved with complexity (aldous1999longest; hunt1977fast; fredman1975computing) or even (van1975preserving) if special data structures are implemented. To solve LCS/LIS, we used the patience sorting algorithm (aldous1999longest).
4.1.1 Perplexities of the combined Consistency-x,y problem and an efficient approximate method
In the combined Consistency-x,y problem we seek to find the largest subset of correspondences which are consistent in both x and y axes. The relation “Consistency-x and Consistency-y” is not transitive. We can observe that easily with a counter-example. Thus, the Consistency-x,y relation is not a partial order, a condition sufficient to rule out reduction to LCS/LIS (fredman1975computing).
Formally, in the Consistency-x,y problem, we seek a set of image correspondences so that:
has the Consistency-x property
has the Consistency-y property
The number of elements () of the set is maximized
We propose an approximate solution by the following method:
We find the largest Consistent-x subset, , solving an LIS problem
We find the largest Consistent-y subset of , solving again an LIS problem
In our suboptimal solution Consistency-x,y holds, thus our primary aim to reject erroneous matches is achieved. Nevertheless, some true matches are rejected.
4.2 A practical verification method
The consistency properties we introduced, depend on assumptions on the geometric structure of the scene. In photos of architectural scenes, usually the axis in camera coordinate system aligns with the perpendicular to the floor vector, leading to the assumption . In special cases, as in photographs of houses on a street, the camera axis may also be aligned between photographs.
Such assumptions may be violated between different views. The effects of camera rotations on a scene are illustrated in Fig. 4. We observe that:
Lines parallel to or axis in one image may appear tilted in another, if the camera coordinate systems are not aligned. The same effect is caused by scene depth variation
The relative order of points may change between two images. Moreover, it is more likely for two points to change order with respect to the -axis, if those points are close in axis but distant in axis, .
Still, in the case of photographs of architectural scenes, we can assume small rotations around the axes, as the photographer’s position in space is constrained. In-plane () rotations are uncommon and can nevertheless be fixed automatically (gallagher2005using).
4.2.1 Approximations to Consistency properties
As Consistency properties are violated by projective phenomena, enforcing them leads to the rejection of many true correspondences. Thus, we relax Consistency properties to arrive at a practical verification method. We describe the method concerned with the order of points on the axis. Similar modifications apply to the Consistency-y property.
First, we introduce a threshold value () to allow violations in the order of points that remain within a predefined distance range. So, two consecutive sequence points , are considered in correct order if
where is the coordinate of the i-th point in the ordered sequence. We set as a fraction of the maximum distance in the axis, of any two points in the image we examine, that matched to points in the paired image:
In the following we refer to this process as “setting the threshold as a percentage of image size”.
We propose to use a recursive method, acting on image subregions of different size. We solve a sequence of LIS problems, each one with a different value, set as a percentage of image size:
We solve an LIS problem using a threshold as a percentage of image size. The result is a Consistent-x set of correspondences
We split the image in two subregions, each with equal number of correspondences . The split is done on the axis
(Recursion): We repeat the process on each of the two subregions. We terminate if the region size is smaller than a predefined constant (we used pixels)
The recursive method has the advantage of allowing for larger violations in the order in the axis for points that are distant in the axis, as explained in Section 4.2. Concerning the computational complexity, we have:
where is the number of recursion steps. depends on the initial image size and . Consequently, the recursive method adds no significant computational burden to the initial LIS problem formulation.
Finally, we remark that other approaches, as dropping recursion or fitting a simple transform to map lines between the images to estimate the value, produced worse results than the proposed recursive method.
5 An application to the multiple-view reconstruction problem
We integrate our methods for the geometric verification of image correspondences and the pair-based estimation of , in existing pipelines to solve the multiple-view reconstruction problem and produce a 3D-model of a scene.
Our approach is outlined in Fig. 5. The final reconstruction is done using the non-sequential SfM with known rotations formulation of (sfmrot2), which we modify extensively, using the methods of the preceeding sections as well as the averaging algorithms we describe in the following.
5.1 Averaging pair-based solutions for
In this paper we introduced computationally efficient methods for estimation, which we apply in randomly sampled minimal correspondences sets, in a way that resembles RANSAC procedures (ransacalgcit). The multiple , estimates, one from each minimal sample, are then averaged, to produce the final solutions.
In , case, we introduce a novel averaging method. In the case of pairwise rotations , we apply the Weiszfeld algorithm (rotavealg2; rotavealg1), which converges to the median (-average) rotation. We also use a form of the Weiszfeld algorithm (multiple rotation averaging) in the rotation registration problem to get the final camera rotation matrices (Section 5.1.2).
5.1.1 Focal length estimates
The distribution of estimates collected from all the possible image pairs
can be skewed or multimodal (Fig.6), in which case the mean or median estimate will not correctly determine value.
We introduce new measures to evaluate the fit of focal length estimates. We initially introduce the Confidence count (cc) and then modify cc using the problem structure to introduce the Joint confidence count (Jcc). We assume that in each image pair that contains image , we receive a number of correct and a number of erroneous estimates for , and that erroneous estimates originating from different image pairs vary significantly in value, whereas correct ones aggregate.
We visualize cc computation in Fig. 7. Simplifying aspects of the computation, we can describe it as a binning procedure, where the bin range is adapted to contain all estimates within deviation:
We collect all estimates of , originating from all the different images we have matched with image
For each , we count the number of estimates, , within a error range. This sum is the confidence count for estimate
We normalize values to range. This step is critical for Jcc computation
To further improve the estimation, we introduce Jcc (Fig. 8). Since each estimate is paired with some estimate (the estimates were computed in an image pair), we expect that if is a good estimate then will be accurate too. To compute Jcc, we follow a similar to cc procedure, but this time each estimate in range contributes a different amount to Jcc sum. This amount is proportional to of estimate that is paired with . Good estimates have higher confidence counts, and contribute more to Jcc.
In greater detail, to compute the of estimate about image , we have:
Let be the images we matched with image . For each image we have:
From all estimates within range of , we pick the ones that originate from pair .
Since every estimate originating from pair is matched to an estimate, from the estimates of the previous step we get the corresponding estimates of
For each of the estimates of , we have a confidence count . We get their mean. We do not use the direct sum, to diminish the influence of a large sum (large ) of low cc’s.
is the sum of the previous mean values.
5.1.2 Rotation estimates
In this section, we summarize rotation averaging using the Weiszfeld algorithm (rotavealg2; rotavealg1). Weiszfeld algorithm returns the -mean in a set of points in space . Many different metrics have been defined for rotation matrices (rotavealg1). We limit our analysis here to
Weiszfeld algorithm is a gradient-descent method and is guaranteed to converge to the true -mean in the case of single rotation averaging, as averaging of pairwise rotation estimates .
The -mean of estimates of a single rotation is the rotation that minimizes:
In this case of rotation registration, the convergence of Weiszfeld algorithm is not guaranteed.
In detail, we applied Weiszfeld algorithm to weight the estimates of the pairwise rotations we acquired through random sampling of minimal point sets ( points) yielding a solution.
In the rotation registration problem we applied the Weiszfeld algorithm in the following manner:
We construct the rotations graph, with one node for every image and an edge between nodes if we know the relative rotation between the respective images. We take a spanning tree in this graph, and using we get the initial estimates
For every node in the graph, we use all available estimates to get inconsistent estimates , through . We average estimates with one iteration of Weiszfeld algorithm
We repeat the previous step times ()
In all our experiments we set .
6 Results & Discussion
6.1 Metric Reconstruction in Pairs of Images
We implemented Kruppa equations, a well-studied and popular method for camera self calibration, and used it as reference method for the estimation of internal camera parameters. To compare the methods, we used synthetic camera projection matrices and image correspondences. We added Gaussian noise to the image points positions, and not to world points or other entities, to simulate actual noisy correspondences.
We observed that our method (Sec. 3) and the Kruppa method produce identical estimates. In rare cases with extremely noise-corrupted correspondences, our method failed and the Kruppa method produced largely erroneous focal length estimates.
We conclude that the two methods are equivalent concerning the self-calibration problem. Still our method is advantageous in additionally providing a metric reconstruction.
Next, we evaluate camera pair reconstruction. We compared our method to the 5-Point(5P) algorithm (5palgcit). We used both approaches as initialization to BA (BAalgcit) and evaluated the quality of the final reconstructions (Tab. 2). We used a multiple-view dataset (datasetcit) and determined the relative positions of all camera pairs with point correspondences. The same focal length estimates were used in both compared approaches. estimates were obtained by the method we introduced in Sec. 3. The two methods we compared were:
Initialize BA with our method: We randomly sampled minimal subsets of correspondences and averaged the acquired solutions with rotation averaging (rotavealg2). We allowed for 20 BA iterations
Initialize BA with 5P algorithm: We used a RANSAC procedure to sample minimal 5P subsets and to pick the solution. We allowed for 20 BA iterations
To quantify the reconstruction error, we used the angle () between the relative rotation estimate and the true relative rotation, , between two paired views .
The initialization of BA is important, to improve convergence and to reduce the computational cost. We observe that both the 5P algorithm and our method can be used as BA initialization with similar performance (Tab. 2). This result implies that to further reduce the reconstruction error, we should improve other problem parameters as image correspondences and focal length estimates.
6.2 Geometric verification of tentative correspondences
In geometric verification, correspondences are classified as correct or erroneous. We evaluate this classifier using precision and recall. Two different datasets were used:
Dataset (datasetcit): The set contains outdoor scenes of landmark buildings. The ground truth camera matrices, , are provided, from which we can separate correct and erroneous correspondences. In detail, from given we recover the fundamental matrix and then evaluate Sampson’s approximation to geometric error for each tentative correspondence (sampsoncit)
Dataset (wongiccv2011): This set contains both architectural scenes and scenes with objects. We give performance results on each of those two subsets independently. In this set, point correspondences between images are provided and labeled as correct or erroneous
The results are presented in Tables 3, 4. Precision of the classifier is more important than its recall, as it is more important to have an oulier-free set of correspondences than to recover all true correspondences. Furthermore, the recursive verification method discards erroneous matches with very high precision. This result supports our argument that points more distant in the one, e.g , axis are more likely to violate order with respect to the other, e.g , axis. Finally, performance varies with scene type. The development of our method was based on scene properties found in architectural scenes. In scenes composed of objects, differences in scene geometry and the increased freedom in viewer’s position cause more violations in the Consistency properties. In Dataset (datasetcit), the performance in scenes of one main building, and consequently of a single main horizontal and vertical direction, as Fountain-P11, entry-P10, Herz-Jesu-P8, reaches almost flawless precision (). In more complex scenes, which include more buildings, as in castle-P30, the achieved precision degrades to values in the range .
|Precision||0.99 0.98||0.98 0.97||0.97 0.96||0.96 0.96||0.95 0.95||0.91 0.91||0.87 0.87|
|Recall||0.80 0.64||0.89 0.72||0.93 0.76||0.94 0.78||0.96 0.80||0.95 0.80||0.96 0.81|
6.3 Improving focal length estimation in multi-view reconstructions
We show in Tab. 5 the improved estimates we get with cc. Further improvement is achieved by Jcc measure. To quantify the error in estimation we use (chandraker2007autocalibration; gherardi2010practical; kukelova2008polynomial):
|Method||Median||Confidence count||Joint confidence count|
6.4 Multi-view reconstruction in unordered image sets
In Tab. 6, we provide quantitative performance measures for multi-view reconstructions that were acquired applying the proposed pipeline (Fig. 5), on unordered image datasets and with no other input apart from the scene photographs. To quantify reconstruction error in camera translation, we used the angle () between the translation estimate and the available true translation . In Fig. 11 we qualitatively display the results of the proposed reconstruction pipeline. The results in Tab. 6 and Fig. 11 demonstrate that the introduced methods can be used in unordered image sets to produce quality reconstructions of the photographed scenes.
Using the DIAC, we developed a linear self-calibration and metric reconstruction method. Two theorems describe the relative configuration of the two recovered solutions and provide support to use the Cheirality condition for solution disambiguation. Comparisons to Kruppa equations and the 5P algorithm revealed that our method performs similarly to these standard approaches. Subsequently we show that the large number of estimates that are produced by our self-calibration and metric reconstruction method can be utilised through averaging methods, shifting our focus from choosing the best solution to finding, eg as in finding an optimised
estimate prior to self-calibration, the best solution averaging method. We also developed a general method to verify point matches between images, which can be solved by reduction to LCS. The corresponding verification method can be used in any problem with image correspondences input. The verification method successfully rejected outliers in both architectural and general scenes, with more success in the former category. All our methods were integrated to a full multiple-view reconstruction pipeline to produce visually high-quality reconstructions on both standard datasets and image sets we shot using a conventional camera. Multi-view reconstructions were obtained combining camera pair reconstructions using rotation averaging algorithms and a novel approach to average focal length estimates.
Appendix A Gaussian elimination in Self calibration and metric recontruction equations
To simplify the expressions, we introduce the notation
and permute elements with the permutation
We denote the permuted vector by and the corresponding system matrix by . Using this notation, we write as
where are vectors and are appropriate constants of no special structure.
We aim to eliminate the elements in the rows and columns of , which we refer to as , and then to apply regular Gaussian elimination. This is generally possible, owing to the structure of rows in (23), which are linear combinations of vectors and, also, using the canonical projective reconstruction allows us to substitute
Thus, the elimination of elements is now straightforward by applying row-operations to matrix . We then apply ordinary Gaussian elimination to reduce to the form of (10).
Appendix B Geometric Relations between the two recovered solutions for metric recontruction of a camera pair
Let denote a projection matrix. The center of projection has no image, as it is projected to point . Equivalently, is a right null-vector of .
Let denote a projection matrix. can be decomposed as
Assume a projection matrix
Let the vector denote the third row of . Then the vector
is in the direction of the principal axis (the viewing direction) of and is directed towards the front of the camera.
The next two lemmas describe properties of metric reconstructions derived from Eq. (17)
be the projection matrices for camera derived from Eq. (17).
The form of homography (4)