I Introduction
Camera pose is one of the oldest and more important problems in 3D computer vision and its purpose is to find the transformation between two reference frames. This problem is important for several applications in robotics and computer vision, ranging from navigation, to localization and mapping, and augmented reality.
Pose problems can be divided in two categories: absolute and relative. In the absolute pose problem, the goal is to find the transformation parameters (rotation and translation) from the world’s to the camera’s reference frame, using a given set of correspondences between features in the world and their images. On the other hand, the relative pose aims at finding the transformation between two camera coordinate systems, from a set of correspondences between projection features and their images. In addition, cameras can be modeled by the perspective model [1, 2], known as central cameras, or by the general camera model [3, 4, 5], here denoted as noncentral cameras. We have noticed that, in the literature, the four cases mentioned above have been in general treated separately, being each case solved by a specific method (a scheme of those specific configurations is shown in Fig. 5.). In this paper, we aim at proposing a general framework for solving general pose problems (i.e. absolute/relative using central/noncentral cameras).
In addition to central/noncentral & absolute/relative cases, pose problems may use minimal or nonminimal data. While the latter consists in estimating the best rotation and translation parameters that fit a specific pose, the former is important for robust random sample consensus techniques (such as RANSAC [6]
). Minimal solutions aim at providing very fast solutions (which are achieved by using the minimal data necessary to compute a solution), and their goal is to obtain a solution that is robust to outliers, rather than giving the best solution for the inliers within the dataset. This means that, even in an environment with outliers, it is important to run nonminimal techniques after getting the inliers from a RANSAC technique to get the best solution.
Being one of the most studied problems in 3D vision, there are several distinct algorithms in the literature to solve each of the problems. When considering absolute pose problems (see Figs. 5LABEL:sub@fig:apc and 5LABEL:sub@fig:apnc for the case of 3D points and their respective images), there are solutions using minimal data with both points and line correspondences for central cameras [7, 8, 9, 10], for noncentral cameras [11, 12, 13, 14, 15]; using nonminimal data using both points and lines for central cameras [16, 17, 18], and for noncentral cameras, [19, 20, 21, 22, 23, 24].
When considering relative pose problems (see Figs. 5LABEL:sub@fig:rpc and 5LABEL:sub@fig:rpnc), there are several solutions for the central camera model, both using minimal data [25, 26, 27, 28] & nonminimal data [29, 30, 31]; and for general noncentral cameras using minimal data [32, 33] & nonminimal data (using both points and lines) [34, 35, 36]. An interesting minimal method that combines both absolute and relative problems was recently proposed in [37].
In this paper we are interested in nonminimal solvers, i.e. we assume that, if necessary, a RANSAC technique was used to get the best inliers before applying our method. We propose a more generic and simpler approach to solve these problems, that can be used in all pose problems. We managed to do so by formulating our problem as an optimization one. Thus, it is necessary to provide an expression for the objective function and the gradients for both the translation and rotation parameters. In the rest of this section we describe the problem, the challenges, and our contributions. In Sec. II we present our framework and the involved algorithms. Sec. III shows some applications in which we use the proposed framework, and the experimental results are presented in Sec. IV. Conclusions are drawn in Sec. V.
Ia Problem Statement and Challenges
In its simplest and most general form, the solution to any pose problem (whether it is absolute/relative for central/noncentral cameras) verifies
(1) 
where are the geometric/algebraic residuals, is the rotation matrix,
is the translation vector,
is the total number of correspondences and is the known data related with the correspondence. This data may involve correspondences between 3D projection lines (relative pose problems) or between projection lines and 3D points (absolute pose problems). Under this formulation, any problem can then be stated as(2) 
In terms of challenges, this problem is, in general, difficult due to its nonlinearities:

is usually a high degree polynomial with monomials combining the nine elements of the rotation matrix and the translation vector; and

, i.e. , corresponds to nine nonlinear quadratic constraints.
IB Our Contributions and Outline of the Paper
We propose a framework to solve absolute/relative pose problems, for central/noncentral camera models, using an Alternating Minimization Method (AMM) and define the corresponding optimization models. The proposed framework requires as inputs:

The upon which the objective function can be obtained;

The specific objective function expression and its Euclidean gradients w.r.t. the rotation () and the translation ().
Both inputs come from the geometry of the problems and from the derivatives of the objective function. There is no need for complex simplifications nor additional complex solvers, which have been used to solve this type of problem in the literature. This is tackled in Sec. II.
To sum up, the main contributions of this paper are:

The use of an AMM to relax the high degree polynomials associated with and their respective constraints (first challenge presented in the previous subsection);

Present steepest descent based algorithms to find the optimal rotation and translation parameters;
The proposed technique is evaluated using synthetic and real data, in which we prove that, despite the simple formulation, it significantly improves the computational time when compared with the stateoftheart techniques.
Ii Solving Pose Problem using Alternating Minimization
This section presents our generic framework. We start by describing the Alternative Minimization theory (Sec. IIA), and then propose an algorithm to solve a general pose problem (Sec. IIB). Finally, Sec. IIC presents the solvers used in the framework.
Iia Alternating Minimization Method (AMM)
The goal of an AMM [39, 40] is to find the minimum of a given objective function depending on two variables and , where belongs to a given set , and to . According to [40], the AMM may be formulated as
(3) 
where and are the sets of variables, and is the function to be minimized. The strategy is to fix one of the variables and solve the resulting optimization problem, in an iterative way. Then, in each iteration, there are two distinct problems that need to be solved:
(4)  
(5) 
starting with a suitable initial guess . The stopping condition for the iterative cycle is
(6) 
where is a threshold for the absolute value of the variation of the objective function in two consecutive iterations, and is the maximum number of iterations allowed.
IiB AMM for Pose Problems
Since a pose estimation problem aims at finding a rotation matrix and a translation vector , the AMM variables and are set to and , respectively. In order to use AMM to solve these problems, we need: 1) an expression for the objective function & its gradients; and 2) solvers to the minimization problems.
Let us consider a generic pose problem, as shown in (2). Depending on the problem, data can be 2D/2D or 3D/2D correspondences (either relative or absolute poses, respectively). Then, we can use the method presented in Sec. IIA to solve the problem: an iterative method which starts by taking an initial guess on the translation () and solve for :
(7) 
yielding an estimate for the rotation matrix which will be plugged into
(8) 
This process repeats for all new estimates , until the stopping condition of (6) is met. An overview of the proposed method is given in Algorithm 1. As a framework, in this stage, one has only to provide , which depends on the specific pose problem to be solved. Below, we present two efficient techniques to solve (7) and (8).
IiC Efficient Solvers for the AMM SubProblems (7) and (8)
To ease the notation, we consider and , where and represent constant rotation and translation parameters.
Efficient solution to (7): We use a steepest descent algorithm for unitary matrices [41, 42] that does not consider the unitary constraints explicitly. This is achieved by iterating in the manifold. At the beginning of each iteration, we compute the Riemannian gradient
, which is a skewsymmetric matrix. Geometrically, it corresponds to the axis from which a rotation step will be calculated. Then, we find the angle that, together with the axis, defines the rotation step that will be applied to the rotation at the beginning of the iteration to reduce the value of the objective function. The details are described in Algorithm
2.Efficient solution to (8): We use another algorithm of steepest descent type [41]. In each iteration, the translation gradient is calculated and multiplied by a coefficient. Then it is added to the current translation. In this way, the solver will converge to a translation vector that will minimize the function for a certain rotation matrix. Details are shown in Algorithm 3.
Keep in mind that Algorithms 1, 2, and 3, upon which our general framework is based, only require the objective function (which depends on the pose problem) and its gradients & .
To evidence the simplicity of our framework in solving pose problems, we present, in the next section, three different applications, i.e. we explain how the framework is applied to three different .
Our framework was implemented in C++ in the OpenGV framework. The code are available in the author’s webpage.
Iii Applications of Our Framework
This section presents three applications of the proposed framework to solve: a relative pose problem (Sec. IIIA) and two absolute pose problems (Secs. IIIB and IIIC).
Iiia General Relative Pose Problem
A relative pose problem consists in estimating the rotation and translation parameters, which ensure the intersection of 3D projection rays from a pair of cameras. Formally, using the Generalized Epipolar constraint [43], for a set of correspondences between left and right inverse projection rays (which sets up ), we can define the objective function as
(9) 
where is a vector that depends on and & are vectors built from the stacked columns of the essential [44] and the rotation matrices, respectively.
The expressions of and are computed directly from (9):
(10) 
where and are matrices whose expressions, due to space limitations, are in the supplementary material. Check the author’s webpage.
IiiB General Absolute Pose Problem
This section addresses the application of the proposed framework to the general absolute pose, i.e. for a set of known correspondences between 3D points and their respective inverse generic projection rays (as presented in [3, 4, 5]), which set up . We consider the geometric distance between a point in the world and its projection ray presented in [45, 38]. After some simplifications, we get the objective function
(11) 
where matrices , , , vectors , , and scalar depend on the data . Again, due to space limitations, these parameters are in the supplementary material. The gradients are easily obtained [46]:
(12)  
(13) 
Although not necessary for our framework, one important advantage of having (11), (12), and (13) in this form instead of the more general formulation of (1), is that the calculation of the objective function and its gradients will not dependent on the number of points, leading to a complexity , instead of .
IiiC General Absolute Pose Problem using the UPnP Metric
In Sec. IIIB, the geometric distance is used to derive an objective function. In the present case we derive a function based on [21]^{1}^{1}1The well known method denoted as UPnP.. The starting point is the constraint
(14) 
where represents the depth, is a vector from the origin of the camera’s reference frame to a ray’s point, represents the ray’s direction, and is a point in the world’s reference frame. Eliminating the depths will result in an objective function which has the same format as (11), but matrices , , , , , and scalar do not depend on the data in the same way as in the previous case (details will also be provided in the supplementary material).
Iv Results
This section presents several results on the evaluation and validation of the proposed framework, in the pose problems of Sec. III. The code, developed in C++ within the OpenGV framework [47], will be made public. We start by evaluating the methods using synthetic data (Sec. IVA), and conclude this section with the real experimental results (Sec. IVB).
Iva Results with Synthetic Data
This section aims at evaluating our framework (Sec. II) under the applications presented in Sec. III, using synthetic data. More specifically, we use the OpenGV toolbox^{2}^{2}2The stateoftheart techniques were already available to use.. Due to space limitation, we refer to [47] for the details on the dataset generation.



We start by the relative pose problem addressed in Sec IIIA, here denoted as AMM, in which the results are presented in Fig. 8. We consider the current stateoftheart techniques: ge (Kneip et al [48]); the 17 pt (Li et al [34]); and the nonlinear (Kneip et al [21]). We use randomly generated data, with noise varying from 0 to 10. For each level of noise, we generate 200 random trials with 20 correspondences between lines in the two camera referential frames, and compute the mean of the errors: 1) The Frobenius norm of the difference between the groundtruth and estimated rotation matrices; and 2) The norm of the difference between the groundtruth and the estimated translation vectors. In addition, we store and compute the mean of the computation time required to compute all the trials. The results for the errors are shown in Fig. 8LABEL:sub@fig:rel_noncentral and for the computation time in Fig. 8LABEL:sub@fig:rel_noncentral_ct.
Next, we evaluate the techniques in Secs. IIIB and IIIC, for the estimation of the camera pose. Again, we consider the OpenGV toolbox to generate the data, the metrics used were the same as before, as well as the number of trials & correspondences. In this case we consider both the central and the noncentral cases. Results are shown in Figs 11LABEL:sub@fig:abs_central and 14LABEL:sub@fig:abs_noncentral. In addition to the methods in Secs. IIIB and IIIC (denoted as AMM (gpnp) and AMM (upnp), respectively) for the central case, we consider: p3p (ransac) (a closedform solution using minimal data [7] within the RANSAC framework); epnp (presented in [16]) and upnp and nonlinear (shown in [21]) which are stateoftheart techniques to compute the camera absolute pose. For the noncentral case, we considered: the gpnp (presented in [38]); the upnp (proposed by Kneip et al [21]); the gp3p (ransac) (minimal solution [11] used within the RANSAC framework); and the non linear (method presented in [21]).
As the initial guess for translation , required by our framework (Algorithm 1), we use the solution given by respective minimal solvers for the absolute pose problems, and the solution given by the linear 17pt with a minimum required number of points for the relative pose problem^{3}^{3}3Note that this solution with the miminimum number of points is significantly faster than the one shown in Tab. 8LABEL:sub@fig:rel_noncentral_ct for the 17pt that uses all the available points. These are very sensitive to noise but very fast, being therefore suitable for a first estimate.

For the relative case (Fig. 8), the rotation found is close to the ge for each noise level and it takes significantly less time. The non linear algorithm has the best accuracy, but its computation time is one order of magnitude higher than all other algorithms considered.
For the central absolute pose (Fig. 11), the upnp and non linear present the same or higher accuracy than our method, but their computation time is one order of magnitude higher (10 times slower). The epnp algorithm’s computation time is similar to ours but significantly less accurate, while the minimal case with RANSAC (p3p (ransac)) is slower and less accurate than ours. For the general noncentral absolute pose case (Fig. 14), concerning the upnp and non linear, the conclusion is the same as before, likewise the minimal case within the RANSAC framework gp3p (ransac).
From these results, one can conclude that the AMM framework proposed in this paper performs better than other methods, despite the fact that it involves an iterative set of simple optimization steps.
IvB Results with Real Data
For the experiments results with real data, we have considered a noncentral multiperspective imaging device, which is given by three perspective cameras with nonoverlapped field of view (see the setup in Fig. 19LABEL:sub@fig:real2:setup). Datasets such as KITTI [49] usually consider stereo systems in which the cameras are aligned with the moving direction of the vehicle. In such cases, when we find the correspondences between two images, the projection lines associated with pixels corresponding to the projection of the same world point become nearly the same, making it difficult to recover the pose using the epipolar constraint (degenerate configuration). This new dataset was acquired to avoid degenerate configurations.
Images were acquired synchronously^{4}^{4}4We use the ROS toolbox (http://www.ros.org/) for that purpose. from a walking path of around 200 meters (see examples of these images in Fig. 19LABEL:sub@fig:real2:images). To get the 2D to 3D correspondences, we use the VisualSFM framework [50, 51].
Cameras’ intrinsic parameters were computed offline. The correspondences between image pixels that are the images of 3D points are converted into 3D projection lines by using the correspondent camera parameters and their transformation w.r.t. each other. The bearing vectors (direction corresponding to the projection rays) and camera centers w.r.t the imaging coordinate system are given as input () to the framework (as well as the 3D points), which were used to compute the absolute pose.
In this experiments, it was considered the following stateoftheart methods: gpnp presented in [38]; the upnp et al [21]; the miminimal solution gp3p [11]; and the non linear method [21]. Looking at 19LABEL:sub@fig:real2:results_b and 19LABEL:sub@fig:real2:results_c, it is possible to conclude that all methods retrieve the path.
In terms of results, for the central case the following times were obtained: non linear 18.00s; upnp 0.75s; epnp 0.07s; amm (gpnp) 0.11s; and amm (upnp) 0.11s (the values of time are the sum of the time computed along the path). For the general moncentral case the following times were obtained: non linear 37.59s; upnp 1.14s; amm (gpnp) 0.17s; and amm (upnp) 0.27s.
These results are in accordance with the conclusions of the precedent subsection. Because of its simplicity, the proposed framework solves these problems faster than current stateoftheart approaches designed to solve specific pose problems.
V Discussion
In this paper, we have proposed a general framework for solving pose problems. Instead of considering each one individually, we start from a general formulation of these kind of problems, and aimed at a framework for solving any pose problem. We state the problem as an optimization one, in which we use an alternating minimization strategy to relax the constraints associated with the nonlinearities of the optimization function.
Theoretically, our framework comes with three different algorithms that were optimized for pose estimation purposes. As for inputs, in addition to the data, the proposed framework requires an objective function (which depends on the considered residuals and data) and their respective gradients w.r.t. the rotation and translation parameters, being therefore very easy to use because: 1) there is no need to eliminate unknown variables to relax the optimization process; and 2) no specific solvers are needed. The framework was included in the OpenGV library, and will be made available for the community^{5}^{5}5Check the author’s webpage..
In terms of experimental results, we run several tests using both synthetic and real data. The main conclusion is that, although the framework is general (in the sense that their solvers aim to solving any pose problem) and very easy to solve (requires few information on the used metric), the sensitivity to noise is not affected (note that this depends on the chosen residual formulation), while being considerably faster.
References
 [1] R. I. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, 2nd ed. Cambridge University Press, 2004.
 [2] Y. Ma, S. Soatto, J. Kosecka, and S. S. Sastry, An Invitation to 3D Vision: From Images to Geometric Models. SpringerVerlag, 2003.
 [3] M. D. Grossberg and S. K. Nayar, “A general imaging model and a method for finding its parameters,” in IEEE Int’l Conf. Computer Vision (ICCV), vol. 2, 2001, pp. 108–115.
 [4] P. Sturm and S. Ramalingam, “A generic concept for camera calibration,” in European Conf. Computer Vision (ECCV), 2004, pp. 1–13.
 [5] P. Miraldo, H. Araujo, and J. Queiro, “Pointbased calibration using a parametric representation of general imaging models,” in IEEE Int’l Conf. Computer Vision (ICCV), 2011, pp. 2304–2311.
 [6] M. A. Fischler and R. C. Bolles, “Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography,” Communications of the ACM, vol. 24, no. 6, pp. 381–395, 1981.

[7]
L. Kneip, D. Scaramuzza, and R. Siegwart, “A novel parametrization of the
perspectivethreepoint problem for a direct computation of absolute camera
position and orientation,” in
IEEE Conf. Computer Vision and Pattern Recognition (CVPR)
, 2011, pp. 2969–2976.  [8] S. Ramalingam, S. Bouaziz, and P. Sturm, “Pose estimation using both points and lines for geolocalization,” in IEEE Int’l Conf. Robotics and Automation (ICRA), 2011, pp. 4716–4723.
 [9] T. Ke and S. I. Roumeliotis, “An efficient algebraic solution to the perspectivethreepoint problem,” in IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2017, pp. 7225–7233.
 [10] P. Wang, G. Xu, Z. Wang, and Y. Cheng, “An efficient solution to the perspectivethreepoint pose problem,” Computer Vision and Image Understanding (CVIU), vol. 166, pp. 81–87, 2018.
 [11] D. Nister, “A minimal solution to the generalised 3point pose problem,” in IEEE Conf. Computer Vision and Pattern Recognition (CVPR), vol. 1, 2004, pp. 560–567.
 [12] P. Miraldo and H. Araujo, “A simple and robust solution to the minimal general pose estimation,” in IEEE Int’l Conf. Robotics and Automation (ICRA), 2014, pp. 2119–2125.
 [13] J. Ventura, C. Arth, G. Reitmayr, and D. Schmalstieg, “A minimal solution to the generalized poseandscale problem,” in IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2014, pp. 422–429.
 [14] G. H. Lee, “A minimal solution for nonperspective pose estimation from line correspondences,” in European Conf. Computer Vision (ECCV), 2016, pp. 170–185.
 [15] P. Miraldo, T. Dias, and S. Ramalingam, “A minimal closedform solution for multiperspective pose estimation using points and lines,” in European Conf. Computer Vision (ECCV), 2018, pp. 490–507.
 [16] V. Lepetit, F. MorenoNoguer, and P. Fua, “EPnP: An accurate O(n) solution to the pnp problem,” Int’l J. Computer Vision (IJCV), vol. 81, no. 2, pp. 578–589, 2009.
 [17] J. A. Hesch and S. I. Roumeliotis, “A direct leastsquares (DLS) method for PnP,” in IEEE Int’l Conf. Computer Vision (ICCV), 2011, pp. 383–390.
 [18] Y. Zheng, Y. Kuang, S. Sugimoto, K. Åström, and M. Okutomi, “Revisiting the PnP problem: A fast, general and optimal solution,” in IEEE Int’l Conf. Computer Vision (ICCV), 2013, pp. 2344–2351.
 [19] C. Sweeney, V. Fragoso, T. Höllerer, and M. Turk, “gDLS: A scalable solution to the generalized pose and scale problem,” in European Conf. Computer Vision (ECCV), 2014, pp. 16–31.
 [20] L. Kneip, P. Furgale, and R. Siegwart, “Using multicamera systems in robotics: Efficient solutions to the NPnP problem,” in IEEE Int’l Conf. Robotics and Automation (ICRA), 2013, pp. 3770–3776.
 [21] L. Kneip, H. Li, and Y. Seo, “UPnP: An optimal O(n) solution to the absolute pose problem with universal applicability,” in European Conf. Computer Vision (ECCV), 2014, pp. 127–142.
 [22] P. Miraldo and H. Araujo, “Planar pose estimation for general cameras using known 3d lines,” in IEEE/RSJ Int’l Conf. Intelligent Robots and Systems (IROS), 2014, pp. 4234–4240.
 [23] S. Haner and K. Åström, “Absolute pose for cameras under flat refractive interfaces,” in IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1428–1436.
 [24] P. Miraldo, H. Araujo, and N. Gonçalves, “Pose estimation for general cameras using lines,” IEEE Trans. Cybermetics, vol. 45, no. 10, pp. 2156–2164, 2015.
 [25] D. Nistér, “An efficient solution to the fivepoint relative pose problem,” IEEE Trans. Pattern Analysis and Machine Intelligence (TPAMI), vol. 26, no. 6, pp. 756–770, 2004.
 [26] H. Stewénius, C. Engels, and D. Nistér, “Recent developments on direct relative orientation,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 60, no. 4, pp. 284–294, 2006.
 [27] H. Li and R. Hartley, “Fivepoint motion estimation made easy,” in IEEE Int’l Conf. Pattern Recognition (ICPR), 2006, pp. 630–633.

[28]
Z. Kukelova, M. Bujnak, and T. Pajdla, “Polynomial eigenvalue solutions to the 5pt and 6pt relative pose problems,” in
British Machine Vision Conference (BMVC), 2008, pp. 56.1–56.10.  [29] D. Nister, O. Naroditsky, and J. Bergen, “Visual odometry,” in IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2004.
 [30] D. Scaramuzza and F. Fraundorfer, “Visual odometry [tutorial],” IEEE Robotics Automation Magazine (RAM), vol. 18, no. 4, pp. 80–92, 2011.
 [31] J. Fredriksson, V. Larsson, C. Olsson, O. Enqvist, and F. Kahl, “Efficient algorithms for robust estimation of relative translation,” Image and Vision Computing (IVC), vol. 52, pp. 114–124, 2016.
 [32] H. Stewénius, D. Nistér, M. Oskarsson, and K. Åström, “Solutions to minimal generalized relative pose problems,” in OMNIVIS, 2005.
 [33] J. Ventura, C. Arth, and V. Lepetit, “An efficient minimal solution for multicamera motion,” in IEEE Int’l Conf. Computer Vision (ICCV), 2015, pp. 747–755.
 [34] H. Li, R. Hartley, and J.H. Kims, “A linear approach to motion estimation using generalized camera models,” in IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2008, pp. 1–8.
 [35] G. H. Lee, F. Faundorfer, and M. Pollefeys, “Motion estimation for selfdriving cars with a generalized camera,” in IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2013, pp. 2746–2753.
 [36] L. Kneip, C. Sweeney, and R. Hartley, “The generalized relative pose and scale problem: Viewgraph fusion via 2d2d registration,” in IEEE Winter Conference on Applications of Computer Vision (WACV), 2016, pp. 1–9.
 [37] F. Camposeco, A. Cohen, M. Pollefeys, and T. Sattler, “Hybrid camera pose estimation,” in IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2018, pp. 136–144.
 [38] G. Schweighofer and A. Pinz, “Globally optimal O(n) solution to the PnP problem for general camera models,” in British Machine Vision Conference (BMVC), 2008, pp. 1–10.
 [39] I. Csiszár and G. Tusnády, “Information geometry and alternating minimization procedures,” Statistics and Decisions, Supplement Issue, vol. 1, pp. 205–237, 1984.
 [40] U. Niesen, D. Shah, and G. W. Wornell, “Adaptive alternating minimization algorithms,” IEEE Trans. Information Theory, vol. 55, no. 3, pp. 1423–1429, 2009.
 [41] E. Fieslere and R. Beale, Eds., Handbook of Neural Computation, 1st ed. Oxford University Press, 1996.

[42]
T. Abrudan, J. Eriksso, and V. Koivunen, “Descent algorithms for optimization under unitary matrix constraint,”
IEEE Trans. Signal Processing, vol. 56, no. 5, pp. 635–650, 2008.  [43] R. Pless, “Using many cameras as one,” in IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2003, pp. 1–7.
 [44] R. Hartley, “In defense of the eightpoint algorithm,” IEEE Trans. Pattern Analysis and Machine Intelligence (TPAMI), vol. 19, no. 6, pp. 580–593, 1997.
 [45] G. Schweighofer and A. Pinz, “Fast and globally convergent structure and motion estimation for general camera models,” in British Machine Vision Conference (BMVC), 2006, pp. 147–156.
 [46] H. Lutkepohl, Handbook of Matrices, 1st ed. Jonh Wiley and Sons, 1996.
 [47] L. Kneip and P. Furgale, “OpenGV: A unified and generalized approach to realtime calibrated geometric vision,” in IEEE Int’l Conf. Robotics and Automation (ICRA), 2014, pp. 1–8.
 [48] L. Kneip and H. Li, “Efficient computation of relative pose for multicamera systems,” in IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2014, pp. 446–453.
 [49] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? The KITTI vision benchmark suite,” in IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2012, pp. 3354–3361.
 [50] C. Wu, S. Agarwal, B. Curless, and S. M. Seitz, “Multicore bundle adjustment,” in IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2011, pp. 3057–3064.
 [51] C. Wu, “Towards lineartime incremental structure from motion,” in Int’l Conf. 3D Vision (3DV), 2013, pp. 127–134.
Vi General Relative Pose Problem
In this section we explain with detail how the objective function (Sec. VIA) and their respective gradients (Sec. VIB) presented in Eq. 9 and 10 of the main paper have been obtained.
Via Objective function
For the objective function, we considered the Generalized Epipolar constraint [43]:
(15) 
where and are the Plücker coordinates of two distinct 3D projection rays that intersect, in two distinct frame coordinates (let us say 1 and 2). and are the essential [1] and the rotation matrices that represent the transformation between the frame coordinates considered. Applying the Kronecker product (here denoted as ) to (15) yields
(16) 
where corresponds to stacked column by column. Considering that some elements of the matrix are zero, it is possible to verify that the entries in positions 2224, 2830, and 3436 of the vector are zero. Thus, we can rewrite (16) as:
(17) 
in which and represent the essential and rotation matrices stacked. The vector will have the same elements as except for the ones that are null. The vector can be obtained by eliminating the elements in positions 2224, 2830, and 3436 of the vector , since they will multiply by ’s null elements, and by taking the elements 46, 1012 and 1618 in and summing them to the elements 1921, 2527 and 3133, since they will be multiplying by the first, second and third column of , respectively. Now, each correspondence between and has a vector associated. Thus, the objective function can be written as
(18) 
yielding Eq. 9 of the paper.
ViB Gradients for and
Here, we will give the full form of the rotation (here denoted as ) and the translation () gradients.
The essential matrix is given by , where is the skewsymmetric matrix associated with the translation vector. The explicit expression of results from stacking the elements of :
(19) 
followed by the stacking of the elements of . The vector will depend on 12 distinct variables. Considering the objective function as given in (18) , we have
(20) 
Now, computing the derivative of with respect to a variable representing any element of the rotation matrix or the translation vector, and taking into account that yields
Consider the derivatives of in order to the three elements of the translation :
(22)  
(23)  
(24) 
Denoting the first, second and third columns of the rotation matrix by , and , respectively, we can assemble the three previous equations as:
(25) 
where () stands to the skewsymmetric matrix associated to the vector . Inserting this result in (VIB) leads to the gradient of the translation given in the paper:
(26) 
We proceed similarly to obtain the gradient of the rotation:
(27)  
(28)  
(29)  
(30)  
(31)  
(32)  
(33)  
(34)  
(35) 
These nine equations contain the derivative of the objective function for each element of the rotation matrix. Writing them in a compact form will lead to:
(36) 
Now, combining this result with (VIB) gives the gradient of the translation:
(37) 
Vii Objective function for the General Absolute Pose
Here we address the general absolute pose (Sec. IIIB of the main paper), by considering the geometric distance between 3D points and an inverse 3D projection ray as a metric for the objective function.
Viia Objective function
The objective function is based on the geometric distance derived in [38]:
(38) 
The vector corresponds to the 3D point in the world frame, corresponds to the camera’s position and corresponds to the projection onto ray direction.
Despite not being necessary in our framework, it would be advantageous, for efficiency purposes, to rewrite the above expression in matricial form. For convenience, we define
(39) 
where is a symmetric matrix. Replacing these expressions in (38) we obtain
(40) 
The dependence of the objective function on the translation is exhibited in the first term. A linear term in the translation and a crossed term in and is displayed in the second term. The third term gives rise to a quadratic term in , a linear term in and a constant term in the final expression. Applying the Kronecker product to the second term leads to the following:
(41)  
(42)  
(43) 
Now, we obtain two more terms that will appear in the final expression of the objective function. The final terms will be obtained by expanding the last term in (40). Replacing by its value (39), we obtain:
(44) 
We note that, for a specific correspondence, the first element of the sum in (44) can be seen as the scalar product of a vector by itself. Applying the Kronecker product to the vector and to the last equation’s second term in (44) leads to
(45) 
Now, replacing (45) in (44), we obtain an expression for the third term in (40):
(46) 
Inserting (43) and (46) in (40) gives rise to the following expression for the objective function in a matrix form, that can be easily used to calculate the gradients:
(47) 
which corresponds to Eq. 11 of the main paper. Given that , .
ViiB Gradients for and
Viii Objective Function with the UPnP Residual
In this section, we explain how the objective function in the last application example (Sec. IIIC of the main paper) and its gradients have been obtained.
Viiia Objective function
To get an expression to the objective function, we proceed similarly as in [21]. The starting point are the equation already presented in (Sec. IIIC):
(49) 
which can be written in the form
(50) 
or, more compactly, as:
(51) 
(52) 
Note that the dimensions of the matrix are , meaning that there are matrices and such that
(53) 
and, as a consequence,
(54) 
Contrarily to [21], we are only interested in eliminating the dependence on the depth’s (). Therefore, from the previous expressions,
(55) 
where is the row of . Since equation (55) does not depend explicitly on the data nor on the parameter we are interested in ( and ), we consider the vector that corresponds to the 3 elementvector of . By making use of these vectors, (55) becomes:
(56) 
By considering the objective function as being
(57) 
and using the residuals in (VIIIA), we get the objective function shown in Eq. 11 of the main paper, where:
(58) 
ViiiB Gradients for and
Ix Use our Framework: An example of its applications
Ixa Code Prototype
In this section we present the Pure Abstract Class in C++. Whatever class is used to build the objective function, it must provide an implementation for the functions below. These are passed on to the AMM solver that will handle the problem.