1 Introduction

Estimating the pose of a calibrated camera given a set of 2D points in the camera frame and a set of 3D points in the world frame, as shown in Figure 1, is a fundamental part of the general 2D–3D registration problem of aligning an image with a 3D scene or model. When correspondences are known, this becomes the PerspectivePoint (PP) problem for which many solutions exist [16, 26, 23, 18, 22]. Applications include camera localisation and tracking [13, 38, 24], augmented reality [34], motion segmentation [39] and object recognition [19, 36, 2].
While hypothesiseandtest frameworks like RANSAC [13] can mitigate the sensitivity of P
P solvers to outliers in the correspondence set, few approaches are able to handle the case where 2D–3D correspondences are not known in advance. Unknown correspondences arise in many circumstances, including the general case of aligning an image with a textureless 3D pointset or CAD model. While feature extraction techniques provide a relatively robust and reproducible way to detect interest points such as edges or corners within each modality, finding correspondences across the two modalities is much more challenging. Even when the pointset has sufficient visual information associated with it, such as colour or SIFT features
[32], repetitive features, occlusions and perspective distortion make the correspondence problem nontrivial. Moreover, appearance and thus visual features may change significantly between viewpoints, lighting conditions, weather and seasons, whereas scene geometry is often less affected. When relocalising a camera in a previously mapped environment or bootstrapping a tracking algorithm, we contend that geometry is often more reliable. Therefore, there is a need for methods that solve for both pose and correspondences.Efficient local optimisation algorithms for solving this joint problem have been proposed [9, 35]. However, they require a pose prior, search only for local optima and do not provide an optimality guarantee, yielding erroneous pose estimates without a reliable means of detecting failure. Hypothesiseandtest approaches such as RANSAC [13], when applied to the correspondencefree problem [15], are global methods that are not reliant on pose priors but quickly become computationally intractable as the number of points and outliers increase and do not provide an optimality guarantee. More recently, a global and suboptimal method has been proposed [5], which uses a branchandbound approach to find a camera pose whose trimmed geometric error is within of the global minimum.
This work is the first to propose a global and optimal inlier set cardinality maximisation solution to the simultaneous pose and correspondence problem. The approach employs the branchandbound framework to guarantee global optimality without requiring a pose prior, ensuring that it is not susceptible to local optima. We use a parametrisation of space that facilitates branching and derive novel bounds on the objective function. In addition, we also apply local optimisation whenever the algorithm finds a better transformation, to accelerate convergence without voiding the optimality guarantee. Cardinality maximisation allows an exact optimiser to be found, unlike the suboptimality inherent to the continuous objective function used in [5]. More critically, cardinality maximisation is inherently robust to 2D and 3D outliers, while avoiding the problems associated with trimming. The latter requires the user to specify the inlier fraction, which can rarely be known and is less intuitive to select than a geometrically meaningful inlier threshold. If the inlier fraction is over or underestimated, this approach may converge to the wrong pose, without a means to detect failure. Figure 2 demonstrates how the global optimum of a trimmed objective function, as used by [5, 49], may not occur at the true pose, a problem that is exacerbated when the inlier fraction is guessed incorrectly.
2 Related Work
A large body of work exists for solving the 2D–3D registration problem when correspondences are provided. When the correspondences are known perfectly, PerspectivePoint (PP) solvers [16, 26, 23, 18, 22] are able to estimate the pose of a camera given a set of noisy image points and their corresponding 3D points. When outliers are present in the correspondence set, the RANSAC framework [13, 8] or robust global optimisation [27, 11, 1, 48, 12, 47] can be used to find the inlier set. Alternatively, outlier removal schemes can make the problem more tractable [46, 40, 50, 7]. Other methods develop sophisticated matching strategies to avoid outlier correspondences at the outset [30, 44, 45, 29]. However, these methods require some correct correspondences. For this reason, they are often only practical for 3D models that have been constructed using stereopsis or StructurefromMotion (SfM). These models associate an image feature with each 3D point, facilitating intermodality feature matching. Generic pointsets do not have this property; a point may lie anywhere on the underlying surfaces in a laser scan, not just where strong image gradients occur.
When correspondences are unknown, the problem becomes more challenging. For the 2D–2D case, problems such as correspondencefree rigid registration [3, 4], SfM [10, 33, 31] and relative camera pose [14] have been addressed. For the 2D–3D case, solution have been proposed for registering a collection of images [43] or multiple cameras [42] to a 3D pointset. The more general problem, however, is pose estimation from a single image. David [9] proposed the SoftPOSIT algorithm, which alternates correspondence assignment with an iterative pose update algorithm. MorenoNoguer [35]
proposed the BlindPnP algorithm, which represents the pose prior as a Gaussian mixture model from which a Kalman filter is initialised for matching. It outperformed SoftPOSIT when large amounts of clutter, occlusions and repetitive patterns were present. However, both are susceptible to local optima, require a pose prior and cannot guarantee global optimality.
Grimson [15] applied a RANSAClike approach to the correspondencefree case, removing the need for a pose prior, but the method is not optimal and quickly becomes intractable as the number of points increase. In contrast, globallyoptimal methods find a camera pose that is guaranteed to be an optimiser of an error function without requiring a pose prior, but tractability remains a challenge. A BranchandBound (BB) [25] strategy may be applied in these cases, for which bounds need to be derived. For example, Breuel [4] used BB for 2D–2D registration problems, Hartley and Kahl [17] for optimal relative pose estimation by bounding the group of 3D rotations, Li and Hartley [28] for rotationonly 3D–3D registration, Olsson [41] for 3D–3D registration with known correspondences, Yang [49] for full 3D–3D registration and Campbell and Petersson [6] for robust 3D–3D registration. While not optimal, Jurie [20] used an approach similar to BB for 2D–3D alignment with a linear approximation of perspective projection. Brown [5] proposed a global and suboptimal method using BB. It finds a camera pose whose trimmed geometric error, the sum of angular distances between the bearings and their rotationallyclosest 3D points, is within of the global minimum. While not susceptible to local minima, it requires the inlier fraction to be specified, which can rarely be known in advance, in order to trim outliers.
Our work is the first globallyoptimal inlier set cardinality maximisation solution to the simultaneous pose and correspondence problem. It is guaranteed to find the exact global optimum without requiring a pose prior and is robust to 2D and 3D outliers while avoiding the distortion of trimming. The rest of the paper is organised as follows: we introduce the problem formulation in Section 3, develop a parametrisation of the domain of 3D motions, a branching strategy and a derivation of the bounds in Section 4, propose an algorithm for globallyoptimal pose and correspondence in Section 5 and evaluate its performance in Section 6.
3 Inlier Set Cardinality Maximisation
Let be a 3D point and
be a bearing vector with unit norm, corresponding to a 2D point imaged by a calibrated camera. That is,
where is the matrix of intrinsic camera parameters and is the homogeneous image point. Given a set of points and bearing vectors and an inlier threshold , the objective is to find a rotation and translation that maximises the cardinality of the inlier set(1) 
(2) 
where denotes the angular distance between vectors. An equivalent formulation is given by
(3) 
(4) 
where is the indicator function that has the value 1 for all elements of the nonnegative real numbers and the value 0 otherwise. The optimal transformation parameters and allow us to find all correspondences with respect to by identifying all pairs for which . We maximise the cardinality of the set of bearing vector inliers, not the set of 3D point inliers, to avoid the degenerate case of all points sharing the same bearing vector inlier, which occurs when the camera is translated far away from the pointset.
4 BranchandBound
To solve the highly nonconvex cardinality maximisation problem (1), the global optimisation technique of BranchandBound (BB) [25] may be applied. To do so, a suitable means of parametrising and branching (partitioning) the function domain must be found, as well as an efficient way to calculate upper and lower bounds of the function for each branch which converge as the size of the branch tends to zero. While the bounds need to be computationally efficient to calculate, the time and memory efficiency of the algorithm also depends on how tight the bounds are, since tighter bounds reduce the search space quicker by allowing suboptimal branches to be pruned.
4.1 Parametrising and Branching the Domain
To find a globallyoptimal solution, the cardinality of the inlier set must be maximised over the domain of 3D motions, that is, the group . However, the space of these transformations is unbounded, therefore we restrict the space of translations to be within the bounded set in order to use BB. For a suitably large , it is reasonable to assume that the camera centre lies within . That is, we can assume that the camera is a finite distance from the 3D points. The domains are shown in Figure 3.
Rotation space is minimally parametrised with angleaxis 3vectors with rotation angle and rotation axis . The notation
is used to denote the rotation matrix obtained from the matrix exponential map of the skewsymmetric matrix
induced by . The Rodrigues’ rotation formula can be used to efficiently calculate this mapping. Using this parametrisation, the space of all 3D rotations can be represented as a solid ball of radius in . The mapping is onetoone on the interior of the ball and twotoone on the surface. For ease of manipulation, we use the 3D cube circumscribing the ball as the rotation domain [28]. Translation space is parametrised with 3vectors in a bounded domain chosen as the cuboid containing the bounding box of . If the camera is known to be inside the 3D scene, can be set to the bounding box, otherwise it is set to an expansion of the bounding box.During BB, the domain is branched into subcuboids using nested octree data structures. They are defined as
(5) 
where is the ^{th} standard basis vector. To simplify the notation, we use and .
The uncertainty angle induced by a rotation and translation subcuboid on a point is shown in Figure 4. The transformed point may lie anywhere within an uncertainty cone, with aperture angle equal to the sum of the rotation and translation uncertainty angles.
4.2 Bounding the Branches
The success of a BB algorithm is predicated on the quality of its bounds. For inlier set maximisation, the objective function (4) needs to be bounded within a transformation domain. Some preparatory material is now presented.
To bound the uncertainty angle due to rotation, Lemmas 1 and 2 from [17] are used. For reference, the relevant parts are merged into Lemma 1, as in [49]. The lemma indicates that the angle between two rotated vectors is less than or equal to the Euclidean distance between their rotations’ angleaxis representations in .
Lemma 1.
For an arbitrary vector and two rotations, represented as and in matrix form and and in angleaxis form,
(6) 
From this, the maximum angle between a vector rotated by and rotated by can be found as follows.
Lemma 2.
(Weak rotation uncertainty angle) Given a 3D point and a rotation cube of half sidelength centred at , then ,
(7) 
Proof.
However, a tighter bound can be found by observing that a point rotated about an axis parallel to the point is not displaced. To exploit this, we maximise the angle over the surface of the cube .
Lemma 3.
(Rotation uncertainty angle) Given a 3D point and a rotation cube centred at with surface , then ,
(10) 
Proof.
Inequality (10) can be derived as follows:
(11)  
(12) 
where (12) is a consequence of the orderpreserving mapping, with respect to the radial angle, from the convex cube of angleaxis vectors to the spherical surface patch (see Figure 3(a)), since the mapping is obtained by projecting from the centre of the sphere to the surface of the sphere. See the appendix for further details. ∎
The uncertainty angle due to translation can be bounded by observing that the translated points form a cube (Figure 3(b)). When the cube does not contain the origin, the angle can be found by maximising over the cube vertices.
Lemma 4.
(Translation uncertainty angle) Given a 3D point and a translation cube centred at with vertices , then ,
(13) 
Proof.
Observe that for , the cube containing all translated points also contains the origin. Therefore can be proportional to and thus the maximum angle is . For ,
(14)  
(15) 
where (15) follows from the convexity of the angle function in this domain. The maximum of a convex function over a convex set must occur at one of its extreme points (the vertices). Geometrically, the cube projects to a spherical hexagon on the unit sphere. The maximum geodesic from a point in the hexagon to any other is to a vertex. ∎
To avoid the nonphysical case where a 3D point is located within a very small value of the camera centre we restrict the translation domain such that .
The translation bound from [5] encloses a translation cube with a sphere of radius and is given by
(16) 
Our bound is tighter with a maximum difference of for cubes and greater for cuboids. Figure 5 compares both translation bounds across a range of values.
The preceding lemmas are used to bound the objective function (4) within a transformation domain . For brevity, we use the notation , and .
Theorem 1.
(Lower bound) For the domain centred at , the lower bound of the inlier set cardinality can be chosen as
(17) 
Proof.
The validity of the lower bound follows from
(18) 
That is, the function value at a specific point within the domain is less than or equal to the maximum. ∎
Theorem 2.
(Upper bound) For the domain centred at , the upper bound of the inlier set cardinality can be chosen as
(19) 
Proof.
By inspecting the translation component of Theorem 2, a tighter upper bound may be found by removing one of the two applications of the triangle inequality. A similar approach cannot be taken for the rotation component since is a complex surface due to the nonlinear conversion from angleaxis to rotation matrix representations. To reduce computation, it is only necessary to evaluate this tighter bound when , since otherwise the point is definitely an outlier and does not need to be investigated further.
Theorem 3.
(Tighter upper bound) For the domain centred at , the upper bound of the inlier set cardinality can be chosen as
(22) 
(23) 
Proof.
may be evaluated by observing that the minimum angle between a ray and a cube is zero if the ray passes through the cube and is otherwise the angle between the ray and the point on the skeleton of the cube (vertices and edges) with least angular displacement from . Thus, for the translation domain with skeleton ,
(26) 
5 The GOPAC Algorithm
The GloballyOptimal Pose And Correspondences (GOPAC) algorithm for a calibrated camera is outlined in Algorithms 1 and 2. As in [49], we employ a nested branchandbound structure for computational efficiency. In the outer breadthfirst BB search, upper and lower bounds are found for each translation cuboid by running an inner BB search over rotation space (denoted RBB). The upper bound (19) of is found by running RBB until convergence with the following bounds
(27)  
(28) 
The tighter upper bound (22) instead uses
(29)  
(30) 
The lower bound (17) is found by running RBB using bounds (27) and (28) with set to zero.
The nested structure has better memory and computational efficiency than directly branching over 6D transformation space, since it maintains a queue for each 3D subproblem, rather than one for the entire 6D problem. This requires significantly fewer simultaneously enqueued subcubes. Moreover, with rotation search nested inside translation search, only has to be calculated once per translation , not once per pose , and can be rotated (by ) instead of which typically has more elements. This makes it possible to precompute the rotated bearing vectors and rotation bounds for the top five levels of the rotation octree to reduce the amount of computation required in the inner BB subroutine.
Line 11 of Algorithm 1 shows how local optimisation is incorporated to refine the camera pose, in a similar manner to [49, 5]. Whenever the BB algorithm finds a subcube pair with a greater lower bound than half the bestsofar cardinality , the PP problem is solved, with correspondences given by the inlier pairs at the pose . We use nonlinear optimisation [21], minimising the sum of angular distances between corresponding bearing vectors and points, and update if a larger is found. In this way, BB and PP collaborate, with PP finding the best pose given correspondences and BB guiding the search for correspondences. PP accelerates convergence since the faster is increased, the sooner subcubes (with ) can be culled (Alg. 1 Line 13). SoftPOSIT [9] is also applied at this stage to jump to the nearest local maxima.
As just observed, a large reduces runtime. Therefore, if the user knows a lower bound on the number of 2D inliers, can be initialised to this value. However, this is rarely known. Instead, our algorithm implements an optional guessandverify approach, without loss of optimality or objective function distortion, which provides especial benefit when 2D outliers are rare: set ; run GOPAC; stop if an optimality guarantee is found, otherwise and repeat. We initialise and .
We also provide a multithreaded implementation, where the initial translation domain is divided into subdomains and GOPAC is run for each in separate CPU threads. The algorithm returns the largest and the associated pose and correspondences. While not supplied, a massively parallel implementation on a GPU is very feasible. Further algorithmic details are provided in the appendix.
6 Results
The GOPAC algorithm was evaluated with respect to the baseline RANSAC [13], SoftPOSIT [9] and BlindPnP [35] algorithms, denoted GP, RS, SP and BP respectively, with synthetic and real data. The RANSAC approach uses the OpenGV framework [21] and the P3P algorithm [23] with randomlysampled correspondences. Since SoftPOSIT and BlindPnP require pose priors to function, we use a torus prior in the synthetic experiments. In general, the space of camera poses is much larger than the restrictive torus prior and a good prior can rarely be known in advance. Except where otherwise specified, the inlier threshold was set to , the rotation and translation bounds (10) and (4) were used, SoftPOSIT and nonlinear PP refinement were applied and multithreading was not used. It is crucial to observe that finding the global optimum does not necessarily imply finding the groundtruth transformation. There may be multiple global optima, particularly in the case of symmetries, and noise may create false optima.
6.1 Synthetic Data Experiments
To evaluate our algorithm in a setting where true priors can be applied, we performed 50 independent Monte Carlo simulations per parameter setting, using the framework of [35]: random 3D points were generated from ; a fraction of the 3D points were randomly selected as outliers to model occlusion; the inliers were projected to a virtual image; normal noise was added with pixels; and random points were added to the image such that a fraction of the 2D points were outliers. To facilitate fair comparison with SoftPOSIT and BlindPnP, we use a pose prior for these experiments. The torus prior constrains the camera centre to a torus around the pointset with the optical axis directed towards the model, as in [35]. BlindPnP represents the poses with a 20 component Gaussian mixture model, the means of which are used to initialise SoftPOSIT, as in [35]. GOPAC is given a set of translation cubes which approximate the torus and is not given the rotation priors.
The results are shown in Figures 7 and 7(a). We repeated the experiments for the repetitive CAD structure shown in Figure 8(a), with results shown in Figure 7(b). Two success rates are reported: the fraction of trials where the true maximum number of inliers was found and the fraction where the correct pose was found, where the angle between the output rotation and the ground truth rotation is less than radians and the camera centre error relative to the ground truth is less than , as in [35]. The 2D and 3D outlier fractions were fixed to 0 when not being varied and multithreading was used in the 2D outlier experiments. GOPAC outperforms the other methods, reliably finding the global optimum while still being relatively efficient, particularly when the fraction of 2D outliers is low. For the repetitive CAD structure, while GOPAC finds the globally optimal number of inliers in all cases, the pose is occasionally incorrect when of the 3D points are occluded, due to the highly symmetric nature of the model.










The evolution of the global lower and upper bounds is shown in Figure 8(c): BB and PP collaborate to increase the lower bound with BB guiding the search into better convergence basins and PP refining the bound by jumping to the nearest local maximum (the staircase pattern). The majority of the time is spent decreasing the upper bound, indicating it will often find the global optimum when terminated early.



Sample 2D and 3D results for two trials using the random points and repetitive CAD model datasets. (fig:models) 3D models, true and GOPACestimated camera fulcra (completely overlapping) and toroidal pose priors. Only nonoccluded 3D points are shown. (fig:2dalignment) True projections of nonoccluded 3D points are shown as black dots, 2D outliers as red dots, GOPAC projections as black circles and GOPACclassified 3D outliers as red crosses. (fig:bound_evolution) Evolution over time of the upper and lower bounds (black), remaining translation volume (blue) and translation queue size (green) as a fraction of their maximum values. Best viewed in colour.
To show the improvement attributable to the tighter upper bounds derived, we measured the runtime of the algorithm with 10 random 3D points and 50% 2D outliers using different upper bounds, shown in Figure 10. The weak spherebased bounding functions in (7) and (16) are denoted and respectively, the tighter cuboidbased bounding functions in (10) and (4) are denoted and respectively and the bounding function from (22) is denoted . Further results are provided in the appendix.
6.2 Real Data Experiments
To evaluate the algorithm on real data, we use the Data61/2D3D (formerly NICTA) dataset [37], a large and repetitive multimodal outdoor dataset. Finding the pose of a camera within a large laserscanned pointset without a good initialisation represents an unsolved problem in computer vision, which this work makes progress towards solving. For each image, we obtain the ground truth camera pose from the provided 2D–3D correspondences using EPP [26] followed by nonlinear PP [21]. Extracting points from a laser scan that correspond to known pixels in an image is itself a challenging unsolved problem for 2D–3D registration pipelines. Due to the robust and optimal nature of GOPAC, we can relax this problem to isolating regions of the pointset that appear in the image and vice versa, from which putative correspondences may be drawn. We used semantic segmentations of the images and pointset to select regions that were potentially observable in both modalities, in this case the ‘building’ class. We then used grid downsampling and means clustering on the class pixels and points independently to reduce them to a manageable size and converted the pixels to bearing vectors. While we do not know the correspondences in advance, each bearing vector has a good chance of having a 3D point as an inlier. In this way, we constructed a dataset consisting of a 3D pointset with 88 points, a set of 11 images containing 30 2D features and a set of ground truth camera poses. For this experiment, we used an inlier threshold of , multithreading and a 2D outlier fraction guess of . The translation domain was m, covering two lanes of the road, making use of the knowledge that the camera was mounted on a survey vehicle. SoftPOSIT and BlindPnP failed to find the correct camera pose for every image in this dataset, even when supplied the ground truth pose as a prior, due to the weak ground truth correspondences and an inability to handle 3D points behind the camera. Moreover, they do not natively support panoramic imagery and required an artificially restricted field of view to function.
Qualitative results for the GOPAC and RANSAC algorithms are shown in Figure 1 and quantitative results in Table 1. GOPAC finds the optimal number of inliers for all frames and the correct camera pose for the majority of frames, despite the weakness of the 2D/3D point extraction process, surpassing the other methods. The failure modes for GOPAC were rotation flips, due to ambiguities arising from the low angular separation of points in the vertical direction. The difficulty of this illposed problem is illustrated by the performance of truncated GOPAC, which was not able to find all optima even after running for 30s, motivating the necessity for globallyoptimal guided search.
Method  GP  

Translation Error (m)  2.30  3.10  20.3  28.5 
Rotation Error ()  2.08  3.04  178  179 
Recall (Inliers)  1.00  0.97  0.75  0.81 
Success Rate (Inliers)  1.00  0.45  0.00  0.00 
Success Rate (Pose)  0.82  0.64  0.09  0.09 
Runtime (s)  477  34  34  471 
7 Conclusion
In this paper, we have introduced a robust and globallyoptimal solution to the simultaneous camera pose and correspondence problem using inlier set cardinality maximisation. The method applies the branchandbound paradigm to guarantee optimality regardless of initialisation and uses local optimisation to accelerate convergence. The pivotal contribution is the derivation of the function bounds using the geometry of . The algorithm outperformed other local and global methods on challenging synthetic and real datasets, finding the global optimum reliably. Further investigation is warranted to develop a complete 2D–3D pipeline, from segmentation and clustering to alignment.
References

[1]
E. Ask, O. Enqvist, and F. Kahl.
Optimal geometric fitting under the truncated norm.
In
Proc. 2013 Conf. Comput. Vision Pattern Recognition
, pages 1722–1729. IEEE, 2013.  [2] M. Aubry, D. Maturana, A. A. Efros, B. C. Russell, and J. Sivic. Seeing 3D chairs: exemplar partbased 2D3D alignment using a large dataset of CAD models. In Proc. 2014 Conf. Comput. Vision Pattern Recognition, pages 3762–3769, 2014.
 [3] P. J. Besl and N. D. McKay. A method for registration of 3D shapes. IEEE Trans. Pattern Anal. Mach. Intell., 14(2):239–256, 1992.
 [4] T. M. Breuel. Implementation techniques for geometric branchandbound matching methods. Computer Vision and Image Understanding, 90(3):258–294, 2003.
 [5] M. Brown, D. Windridge, and J.Y. Guillemaut. Globally optimal 2D3D registration from points or lines without correspondences. In Proc. 2015 Int. Conf. Comput. Vision, pages 2111–2119, 2015.
 [6] D. Campbell and L. Petersson. GOGMA: GloballyOptimal Gaussian Mixture Alignment. In Proc. 2016 Conf. Comput. Vision Pattern Recognition, pages 5685–5694. IEEE, June 2016.

[7]
T.J. Chin, Y. Heng Kee, A. Eriksson, and F. Neumann.
Guaranteed outlier removal with mixed integer linear programs.
In Proc. 2016 Conf. Comput. Vision Pattern Recognition, pages 5858–5866, 2016.  [8] O. Chum and J. Matas. Optimal randomized RANSAC. IEEE Trans. Pattern Anal. Mach. Intell., 30(8):1472–1482, 2008.
 [9] P. David, D. Dementhon, R. Duraiswami, and H. Samet. SoftPOSIT: simultaneous pose and correspondence determination. Int. J. Comput. Vision, 59(3):259–284, 2004.
 [10] F. Dellaert, S. M. Seitz, C. E. Thorpe, and S. Thrun. Structure from motion without correspondence. In Proc. 2000 Conf. Comput. Vision Pattern Recognition, volume 2, pages 557–564. IEEE, 2000.
 [11] O. Enqvist, E. Ask, F. Kahl, and K. Åström. Robust fitting for multiple view geometry. In Proc. 2012 European Conf. Comput. Vision, pages 738–751. Springer Berlin Heidelberg, 2012.
 [12] O. Enqvist, E. Ask, F. Kahl, and K. Åström. Tractable algorithms for robust model estimation. Int. J. Comput. Vision, 112(1):115–129, 2015.
 [13] M. A. Fischler and R. C. Bolles. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6):381–395, 1981.
 [14] J. Fredriksson, V. Larsson, C. Olsson, and F. Kahl. Optimal relative pose with unknown correspondences. In Proc. 2016 Conf. Comput. Vision Pattern Recognition, pages 1728–1736. IEEE, 2016.
 [15] W. E. L. Grimson. Object Recognition by Computer: The Role of Geometric Constraints. MIT Press, Cambridge, MA, USA, 1990.
 [16] B. M. Haralick, C.N. Lee, K. Ottenberg, and M. Nölle. Review and analysis of solutions of the three point perspective pose estimation problem. Int. J. Comput. Vision, 13(3):331–356, 1994.
 [17] R. I. Hartley and F. Kahl. Global optimization through rotation space search. Int. J. Comput. Vision, 82(1):64–79, 2009.
 [18] J. A. Hesch and S. I. Roumeliotis. A direct leastsquares (DLS) method for PnP. In Proc. 2011 Int. Conf. Comput. Vision, pages 383–390. IEEE, 2011.
 [19] D. P. Huttenlocher and S. Ullman. Recognizing solid objects by alignment with an image. Int. J. Comput. Vision, 5(2):195–212, 1990.
 [20] F. Jurie. Solution of the simultaneous pose and correspondence problem using Gaussian error model. Computer Vision and Image Understanding, 73(3):357–373, 1999.
 [21] L. Kneip and P. Furgale. OpenGV: A unified and generalized approach to realtime calibrated geometric vision. In Proc. 2014 Int. Conf. Robotics and Automation, pages 1–8. IEEE, 2014.
 [22] L. Kneip, H. Li, and Y. Seo. UPnP: An optimal O(n) solution to the absolute pose problem with universal applicability. In Proc. 2014 European Conf. Comput. Vision, pages 127–142. Springer, 2014.
 [23] L. Kneip, D. Scaramuzza, and R. Siegwart. A novel parametrization of the perspectivethreepoint problem for a direct computation of absolute camera position and orientation. In Proc. 2011 Conf. Comput. Vision Pattern Recognition, pages 2969–2976. IEEE, 2011.
 [24] L. Kneip, Z. Yi, and H. Li. SDICP: Semidense tracking based on iterative closest points. In Proc. 2015 British Machine Vision Conference, pages 100.1–100.12. BMVA Press, Sep. 2015.
 [25] A. H. Land and A. G. Doig. An automatic method of solving discrete programming problems. Econometrica: Journal of the Econometric Society, pages 497–520, 1960.
 [26] V. Lepetit, F. MorenoNoguer, and P. Fua. EPnP: An accurate O(n) solution to the PnP problem. Int. J. Comput. Vision, 81(2):155–166, 2009.
 [27] H. Li. Consensus set maximization with guaranteed global optimality for robust geometry estimation. In Proc. 2009 Int. Conf. Comput. Vision, pages 1074–1080. IEEE, 2009.
 [28] H. Li and R. Hartley. The 3D3D registration problem revisited. In Proc. 2007 Int. Conf. Comput. Vision, pages 1–8. IEEE, 2007.
 [29] Y. Li, N. Snavely, D. Huttenlocher, and P. Fua. Worldwide pose estimation using 3D point clouds. In Proc. 2012 European Conf. Comput. Vision, pages 15–29. SpringerVerlag, 2012.
 [30] Y. Li, N. Snavely, and D. P. Huttenlocher. Location recognition using prioritized feature matching. In Proc. 2010 European Conf. Comput. Vision, pages 791–804. Springer, 2010.
 [31] W.Y. Lin, L.F. Cheong, P. Tan, G. Dong, and S. Liu. Simultaneous camera pose and correspondence estimation with motion coherence. Int. J. Comput. Vision, 96(2):145–161, 2012.
 [32] D. G. Lowe. Distinctive image features from scaleinvariant keypoints. Int. J. Comput. Vision, 60(2):91–110, 2004.
 [33] A. Makadia, C. Geyer, and K. Daniilidis. Correspondencefree structure from motion. Int. J. Comput. Vision, 75(3):311–327, 2007.
 [34] E. Marchand, H. Uchiyama, and F. Spindler. Pose estimation for augmented reality: a handson survey. IEEE Trans. Vis. Comput. Graphics, 22(12):2633–2651, 2016.
 [35] F. MorenoNoguer, V. Lepetit, and P. Fua. Pose priors for simultaneously solving alignment and correspondence. In Proc. 2008 European Conf. Comput. Vision, pages 405–418. Springer, 2008.
 [36] J. L. Mundy. Object recognition in the geometric era: A retrospective. In J. Ponce, M. Hebert, C. Schmid, and A. Zisserman, editors, Toward CategoryLevel Object Recognition, volume 4170 of Lecture Notes in Computer Science, pages 3–28. Springer, Berlin, Heidelberg, 2006.
 [37] S. T. Namin, M. Najafi, M. Salzmann, and L. Petersson. A multimodal graphical model for scene analysis. In Proc. 2015 Winter Conf. Applications Comput. Vision, pages 1006–1013. IEEE, 2015.
 [38] T. Nöll, A. Pagani, and D. Stricker. Markerless Camera Pose Estimation  An Overview. In A. Middel, I. Scheler, and H. Hagen, editors, Visualization of Large and Unstructured Data Sets  Applications in Geospatial Planning, Modeling and Engineering (IRTG 1131 Workshop), volume 19 of OpenAccess Series in Informatics (OASIcs), pages 45–54, Dagstuhl, Germany, 2011. Schloss Dagstuhl–LeibnizZentrum fuer Informatik.
 [39] C. F. Olson. A general method for geometric feature matching and model extraction. Int. J. Comput. Vision, 45(1):39–54, 2001.
 [40] C. Olsson, A. Eriksson, and R. Hartley. Outlier removal using duality. In Proc. 2010 Conf. Comput. Vision Pattern Recognition, pages 1450–1457. IEEE, 2010.
 [41] C. Olsson, F. Kahl, and M. Oskarsson. Branchandbound methods for euclidean registration problems. IEEE Trans. Pattern Anal. Machine Intelligence, 31(5):783–794, 2009.
 [42] D. P. Paudel, A. Habed, C. Demonceaux, and P. Vasseur. LMIbased 2D3D registration: From uncalibrated images to Euclidean scene. In Proc. 2015 Conf. Comput. Vision Pattern Recognition, pages 4494–4502, 2015.
 [43] D. P. Paudel, A. Habed, C. Demonceaux, and P. Vasseur. Robust and optimal sumofsquaresbased pointtoplane registration of image sets and structured scenes. In Proc. 2015 Int. Conf. Comput. Vision, pages 2048–2056, 2015.
 [44] T. Sattler, B. Leibe, and L. Kobbelt. Fast imagebased localization using direct 2Dto3D matching. In Proc. 2011 Int. Conf. Comput. Vision, pages 667–674. IEEE, 2011.
 [45] T. Sattler, B. Leibe, and L. Kobbelt. Improving imagebased localization by active correspondence search. In Proc. 2012 European Conf. Comput. Vision, pages 752–765. SpringerVerlag, 2012.
 [46] K. Sim and R. Hartley. Removing outliers using the norm. In Proc. 2006 Conf. Comput. Vision Pattern Recognition, volume 1, pages 485–494. IEEE, 2006.
 [47] L. Svarm, O. Enqvist, F. Kahl, and M. Oskarsson. Cityscale localization for cameras with known vertical direction. IEEE Trans. Pattern Anal. Machine Intelligence, 2016.
 [48] L. Svärm, O. Enqvist, M. Oskarsson, and F. Kahl. Accurate localization and pose estimation for large 3D models. In Proc. 2014 Conf. Comput. Vision Pattern Recognition, pages 532–539. IEEE, 2014.
 [49] J. Yang, H. Li, D. Campbell, and Y. Jia. GoICP: A globally optimal solution to 3D ICP pointset registration. IEEE Trans. Pattern Anal. Mach. Intell., 38(11):2241–2254, 2016.
 [50] J. Yu, A. Eriksson, T.J. Chin, and D. Suter. An adversarial optimization approach to efficient outlier removal. In Proc. 2011 Int. Conf. Comput. Vision, pages 399–406. IEEE, 2011.
Comments
There are no comments yet.