Multiview triangulation refers to the problem of locating the 3D point given its projections in multiple views of known calibration and pose. It plays a fundamental role in many applications of computer vision, , structure-from-motion[3, 34, 43], visual(-inertial) odometry [12, 25, 29] and simultaneous localization and mapping [18, 36, 39].
Under the assumption of perfect information (, image measurements, calibration and pose data without noise and outliers), triangulation simply amounts to intersecting the backprojected rays corresponding to the same point. In practice, however, noise and outliers are often inevitable. This makes the triangulation problem nontrivial. From a practical perspective, the following aspects should be taken into account when considering a triangulation method:
1. Is it applicable to multiple views? Some methods are developed specifically for two or three views (, two-view optimal methods [16, 22, 27, 31, 37], two-view midpoint methods [6, 16, 28], three-view optimal methods [7, 17, 26, 46]). For more than two or three views, these methods are not directly applicable unless they are incorporated in, for example, a RANSAC  framework.
2. Is it robust to outliers? Many existing multiview triangulation methods are sensitive to outliers, , the linear methods , the midpoint-based methods [41, 52], the -optimal methods [5, 19, 21, 33] and the -optimal methods [4, 10, 15, 20]. To deal with outliers, various methods have been proposed, , outlier rejection using the norm [30, 38, 44], -th median minimization  and iteratively reweighted least squares . We refer to  for a comprehensive review of the robust triangulation methods.
3. Is it fast? While the aforementioned methods can handle a moderate number of outliers, they either fail or incur excessive computational cost at high outlier ratios . For this reason, RANSAC is often recommended as a preprocessing step [30, 38, 44]. In , an efficient method using two-view RANSAC is proposed.
4. Does it estimate the uncertainty? To our knowledge, none of the aforementioned works provide the uncertainty estimate for the triangulated point. Knowing this uncertainty can be useful for point cloud denoising , robust localization [42, 48] and mapping [9, 35], among others.
In this work, we propose a robust and efficient method for uncertainty-aware multiview triangulation. Our contributions are summarized as follows:
In Sect. 3.1, we propose an outlier rejection scheme using two-view RANSAC with the midpoint method. By reformulating the midpoint, we screen out the bad samples even before computing the midpoint. This improves the efficiency when the outlier ratio is high.
In Sect. 3.2, we revisit three existing local optimization methods, one of which is the Gauss-Newton method. For this method, we present an efficient computation of the Jacobian matrix. We closely evaluate the three methods with an iterative update of the inlier set.
In Sect. 3.3, we model the uncertainty of a triangulated point as a function of three factors: the number of (inlying) cameras, the mean reprojection error and the maximum parallax angle. We propose to learn this model from extensive simulations, so that at test time, we can estimate the uncertainty by interpolation. The estimated uncertainty can be used to control the 3D accuracy.
2 Preliminaries and Notation
We use bold lowercase letters for vectors, bold uppercase letters for matrices, and light letters for scalars. We denote the Hadamard product, division and square root by, and , respectively. The Euclidean norm of a vector is denoted by , and the unit vector by . The angle between two vectors and is denoted by . We denote the vectorization of an matrix by and its inverse by .
Consider a 3D point in the world reference frame and a perspective camera observing this point. In the camera reference frame, the 3D point is given by , where and are the rotation and translation that relate the reference frame to the world, is the extrinsic matrix, and is the homogeneous coordinates of . In the world frame, the camera position is given by . Let be the homogeneous pixel coordinates of the point and the camera calibration matrix. Then, the normalized image coordinates are obtained by .
Let be the set of all views in which the point is observed. The aim of multiview triangulation is to find the best estimate of given that noisy , , and are known for all . Once we have the estimate , the 3D error is given by , and the 2D error (aka the reprojection error) is given by
where is the reprojection of in . To compute compactly, we define the following matrices:
where is the element of at the -th row and -th column. Then, can be obtained as follows:
We provide the derivation in the supplementary material. Note that , and are independent of the point, and thus can be precomputed for efficiency.
We define the positive z-axis of the camera as the forward direction. This means that if the -th element of is negative, the point is behind the camera , violating the cheirality . Hence, given the estimated point , the corresponding set of inliers is obtained by
where indicates the -th element, and is the inlier threshold. We denote the number of elements of by . We define the maximum parallax angle of as follows:
where and are the two corresponding unit rays from camera and expressed in the world frame, ,
3.1 Fast Two-View RANSAC for Outlier Rejection
To obtain the initial solution and inlier set, we perform two-view RANSAC as in . Our method has two notable differences to : First, instead of the DLT method , we use the midpoint method [6, 16], which is faster and as accurate unless the parallax is very low [16, 28]. Second, we prescreen the samples before performing the two-view triangulation. Our method consists of the following steps:
1. Check the normalized epipolar error.
Let camera and be the two-view sample. The normalized epipolar error  of the sample is defined as
2. Check the parallax angle.
The raw parallax  defined as is a rough estimate of the parallax angle if and are both inliers. If is too small, the triangulation is inaccurate . If it is too large, the sample most likely contains an outlier, because such a point is rarely matched in practice due to a severe viewpoint change. We check from its cosine:
3. Check additional degeneracy.
Likewise, if or is too small, the epipolar geometry degenerates (see Fig. 1). To avoid degeneracy, we also check these angles from their cosines:
4. Check the depths of the midpoint anchors.
Let and be the depths of the midpoint anchors (see Fig. 1). In the supplementary material, we show that
Since , we check the signs of and to ensure that and are both positive.
5. Evaluate the midpoint w.r.t. the two views.
Only when the two-view sample passes all of the aforementioned checks, we compute the midpoint:
Then, we check the cheirality and reprojection errors in the two views. The entire procedure is detailed in Alg. 2.
Each midpoint from the two-view samples becomes a hypothesis for and is scored based on its reprojection error and cheirality. Specifically, we use the approach of  and find the hypothesis that minimizes the following cost:
3.2 Iterative Local Optimization
Once we have the initial triangulation result and the inlier set from the two-view RANSAC, we perform local optimization for refinement. Our approach is similar to , except that we perform the optimization only at the end of RANSAC. We compare three optimization methods:
DLT and LinLS : These two linear methods minimize the algebraic errors in closed form. For the formal descriptions, we refer to . They were originally developed for two-view triangulation, but they can be easily extended to multiple views. We construct the linear system with only the inliers, solve it and update the inlier set. This is repeated until the inlier set converges.
GN: This nonlinear method minimizes the geometric errors using the Gauss-Newton algorithm. After each update of the solution, we update the inlier set.
The GN method requires the computation of the Jacobian matrix in each iteration. In the following, we present an efficient method for computing . Recall that we are minimizing , which we know from (1) is equal to , where . This means that we can obtain by stacking
for all . We now define the following vectors:
where and respectively indicate the elements of and at the -th row and -th column, and indicate the -th element of . Then, we can rewrite (23) as
We provide the derivation in the supplementary material. Since and can be precomputed independently of the point, the Jacobian can computed more efficiently. Alg. 3 summarizes the GN method.
3.3 Practical Uncertainty Estimation
We model the uncertainty of the triangulated point as a function of three factors: the number of inlying views (), the mean reprojection error in those views () and the maximum parallax angle () defined by (12).
To this end, we run a large number of simulations in various settings and aggregate the 3D errors for each different range of factors (see Fig. 2). We then store these data on a 3D regular grid that maps to the uncertainty . At test time, we estimate the uncertainty by performing trilinear interpolation on this grid.
We point out two things in our implementation: First, to reduce the small sample bias in , we perform monotone smoothing that enforces to increase with and decreases with and . The smoothing method is described and demonstrated in the supplementary material. Second, we limit the number of pairs we evaluate for computing in (13). This curbs the computational cost when is very large. Alg. 4 summarizes the procedure.
4.1 Uncertainty Estimation
To find out how the different factors impact the 3D accuracy of triangulation, we run a large number of simulations in various settings configured by the following parameters:
: number of cameras observing the point.
: distance between the ground-truth point and the origin.
: std. dev. of Gaussian noise in the image coordinates.
: number of independent simulation runs for each configuration .
The parameter values are specified in the supplementary material. The simulations are generated as follows: We create cameras, of those randomly located inside a sphere of unit diameter at the origin. We place one of the two remaining cameras at a random point on the sphere’s surface and the other at its antipode. This ensures that the geometric span of the cameras is equal to one unit. The size and the focal length of the images are set to and pixel, respectively, the same as those of . Next, we create a point at and orient the cameras randomly until the point is visible in all images. Then, we add the image noise of to perturb the image coordinates.
For triangulation, we initialize the point using the DLT method and refine it using the GN method. In this experiment, we assume that all points are always inliers, so we do not update the inlier set during the optimization. Fig. 2 shows the 3D error distribution with respect to different numbers of cameras (), mean reprojection errors () and maximum parallax angle (). In general, we observe that the 3D accuracy improves with more cameras, smaller and larger . However, this effect diminishes past a certain level. For example, the difference between the 30 and 50 cameras is much smaller than the difference between the 2 and 3. Also, when is sufficiently large, the 3D accuracy is less sensitive to the change in , and .
Fig. 2 clearly indicates that we must take into account all these three factors when estimating the 3D uncertainty of a triangulated point. Marginalizing any one of them would reduce the accuracy. This observation agrees with our intuition, as each factor conveys important independent information about the given triangulation problem.
4.2 Triangulation Performance
We evaluate the performance of our method on synthetic data. The simulation is configured in a similar way as in the previous section. The difference is that we set , , pixel, thousand, and we perturb some of the measurements by more than 10 pixel, turning them into outliers. The outlier ratio is set to , , , and percent. Varying and the outlier ratio results in configurations, so in total, this amounts to two million unique triangulation problems.
On this dataset, we compare our method against the state of the art ( by Schönberger and Frahm) with and without the local optimization (DLT, LinLS and GN). In Alg. 1, we set , pix, , pix, , and . Fig. 3 shows the results. On average, ours and  perform similarly (but ours is faster, as will be shown later). For both methods, the local optimization substantially improves the 2D and 3D accuracy. Thanks to the iterative update of the inlier set, we also see a significant gain in recall. Among the optimization methods, DLT and GN show similar performance in all criteria, while LinLS exhibits larger 3D error than the other two. We provide a closer comparison between DLT and GN in the next section.
In general, when the point is far and the outlier ratio is high, the performance degrades for all methods. At any fixed outlier ratio, we observe that the 3D error tends to grow with the point distance. However, the same cannot be said for the 2D error. This is because given sufficient parallax, the 2D accuracy is mostly influenced by the image noise statistics, rather than the geometric configurations.
We also evaluate the accuracy after pruning the most uncertain points using our method (Sect. 4.1). In Fig. 4, we plot the error histograms of the points with different levels of the estimated 3D uncertainty (). It shows that with a smaller threshold on , we get to prune more points with larger 3D error. Fig. 5 shows the cumulative 3D error plots. It illustrates that thresholding on gives us some control over the upper bound of the 3D error. As a result, we are able to trade off the number points for 3D accuracy by varying the threshold level. This is shown in Fig. 6.
To compare the timings, all methods are implemented in MATLAB and run on a laptop CPU (Intel i7-4810MQ, 2.8GHz). Tab. 1 provides the relative speed of our two-view RANSAC compared to . It shows that ours is faster, especially when the point is far and the outlier ratio is high. This demonstrates the advantage of the early termination of two-view triangulation (Alg. 2). In Tab. 2, we present the timings of the local optimization and uncertainty estimation. We found that DLT is slightly faster than LinLS and almost twice faster than GN.
|4.15, 4.15||4.43, 4.32||4.81, 4.40||5.90, 4.47|
|4.61, 4.39||4.80, 4.45||4.78, 4.08||7.33, 4.79|
|5.28, 4.56||5.64, 4.72||5.87, 4.38||10.4, 5.22|
|7.46, 5.09||8.06, 5.30||9.13, 4.93||19.9, 6.45|
|28.6, 7.64||31.8, 8.14||40.8, 8.86||72.6, 13.1|
4.3 DLT vs. Gauss-Newton Method
To compare the accuracy of DLT and GN more closely, we perform additional simulations in outlier-free scenarios. The simulation is set up in a similar way as in Section 4.1 (see the supplementary material for details).
In terms of 3D accuracy, we found that the two methods perform almost equally most of the time. The comparison is inconsistent only when the maximum parallax angle is very small (less than 6 deg or so). We show this result in the supplementary material.
As for the 2D accuracy, the difference is sometimes noticeable. Fig. 7 shows the mean and the maximum difference of 2D error. On average, GN offers less gain for more cameras, smaller noise and lower parallax. This explains why we could not see the difference between DLT and GN in Fig. 3. However, the bottom row of Fig. 7 reveals that GN sometimes provides a significant gain over DLT even when the average difference is small.
4.4 Results on Real Data
We evaluate our method on three real datasets: Dinosaur , Corridor  and Notre Dame . We only consider the points that are visible in three or more views. In our algorithm, we use the same parameters as in Sect. 4.2 and discard the point that is visible in less than three views after RANSAC. Fig. 8 shows the 3D reconstruction results.
In this work, we presented a robust and efficient method for multiview triangulation and uncertainty estimation. We proposed several early termination criteria for two-view RANSAC using the midpoint method, and showed that it improves the efficiency when the outlier ratio is high. We also compared the three local optimization methods (DLT, LinLS and GN), and found that DLT and GN are similar (but better than LinLS) in terms of 3D accuracy, while GN is sometimes much more accurate than DLT in terms of 2D accuracy. Finally, we proposed a novel method to estimate the uncertainty of a triangulated point based on the number of (inlying) views, the mean reprojection error and the maximum parallax angle. We showed that the estimated uncertainty can be used to control the 3D accuracy. An extensive evaluation was performed on both synthetic and real data.
-  Oxford Multiview Datasets. http://www.robots.ox.ac.uk/~vgg/data/data-mview.html.
-  K. Aftab and R. Hartley. Convergence of iteratively re-weighted least squares to robust m-estimators. In IEEE Winter Conf. Appl. Comput. Vis., pages 480–487, 2015.
-  S. Agarwal, Y. Furukawa, N. Snavely, I. Simon, B. Curless, S. M. Seitz, and R. Szeliski. Building rome in a day. Commun. ACM, 54(10):105–112, 2011.
S. Agarwal, N. Snavely, and S. M. Seitz.
Fast algorithms for problems in multiview geometry.
IEEE Conf. Comput. Vis. Pattern Recognit., pages 1–8, 2008.
-  C. Aholt, S. Agarwal, and R. Thomas. A QCQP approach to triangulation. In Eur. Conf. Comput. Vis., pages 654–667, 2012.
-  P. A. Beardsley, A. Zisserman, and D. W. Murray. Sequential updating of projective and affine structure from motion. Int. J. Comput. Vis., 23(3):235–259, 1997.
-  M. Byröd, K. Josephson, and K. Åström. Fast optimal three view triangulation. In Asian Conf. Comput. Vis., pages 549–559, 2007.
-  O. Chum, J. Matas, and J. Kittler. Locally optimized RANSAC. In Pattern Recognition, pages 236–243, 2003.
-  A. Concha and J. Civera. DPPTAM: Dense piecewise planar tracking and mapping from a monocular sequence. In IEEE/RSJ Int. Conf. Intell. Robots. Syst., pages 5686–5693, 2015.
-  Z. Dai, Y. Wu, F. Zhang, and H. Wang. A novel fast method for problems in multiview geometry. In Eur. Conf. Comput. Vis., pages 116–129, 2012.
-  M. A. Fischler and R. C. Bolles. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM, 24(6):381–395, 1981.
-  C. Forster, Z. Zhang, M. Gassner, M. Werlberger, and D. Scaramuzza. SVO: semidirect visual odometry for monocular and multicamera systems. IEEE Trans. Robot., 33(2):249–265, 2017.
-  S. H. Lee and J. Civera. Geometric interpretations of the normalized epipolar error. CoRR, abs/2008.01254, 2020.
-  R. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge University Press, 2 edition, 2003.
-  R. I. Hartley and F. Schaffalitzky. minimization in geometric reconstruction problems. In IEEE Conf. Comput. Vis. Pattern Recognit., 2004.
-  R. I. Hartley and P. Sturm. Triangulation. Comput. Vis. Image Underst., 68(2):146–157, 1997.
-  J. Hedborg, A. Robinson, and M. Felsberg. Robust three-view triangulation done fast. In IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, pages 152–157, 2014.
-  D. C. Herrera, K. Kim, J. Kannala, K. Pulli, and J. Heikkilä. DT-SLAM: Deferred triangulation for robust SLAM. In IEEE Int. Conf. 3D Vis., pages 609–616, 2014.
-  F. Kahl, S. Agarwal, M. K. Chandraker, D. Kriegman, and S. Belongie. Practical global optimization for multiview geometry. Int. J. Comput. Vis., 79(3):271–284, 2008.
-  F. Kahl and R. Hartley. Multiple-view geometry under the -norm. IEEE Trans. Pattern Anal. Mach. Intell., 30(9):1603–1617, 2008.
-  F. Kahl and D. Henrion. Globally optimal estimates for geometric reconstruction problems. Int. J. Comput. Vis., 74(1):3–15, 2007.
-  K. Kanatani, Y. Sugaya, and H. Niitsuma. Triangulation from two views revisited: Hartley-Sturm vs. optimal correction. In Brit. Mach. Vis. Conf., pages 173–182, 2008.
-  L. Kang, L. Wu, and Y.-H. Yang. Robust multi-view triangulation via optimal inlier selection and 3D structure refinement. Pattern Recognition, 47(9):2974 – 2992, 2014.
-  Q. Ke and T. Kanade. Quasiconvex optimization for robust geometric reconstruction. IEEE Trans. Pattern Anal. Mach. Intell., 29(10):1834–1847, 2007.
-  G. Klein and D. Murray. Parallel tracking and mapping for small AR workspaces. In IEEE/ACM Int. Symp. Mixed Augmented Reality, 2007.
-  Z. Kukelova, T. Pajdla, and M. Bujnak. Fast and stable algebraic solution to three-view triangulation. In IEEE Int. Conf. 3D Vis., pages 326–333, 2013.
-  S. H. Lee and J. Civera. Closed-form optimal two-view triangulation based on angular errors. In IEEE Int. Conf. Comput. Vis., pages 2681–2689, 2019.
-  S. H. Lee and J. Civera. Triangulation: Why optimize? In Brit. Mach. Vis. Conf., 2019.
-  S. Leutenegger, S. Lynen, M. Bosse, R. Siegwart, and P. Furgale. Keyframe-based visual–inertial odometry using nonlinear optimization. Int. J. Robot. Res., 34(3):314–334, 2015.
-  H. Li. A practical algorithm for triangulation with outliers. In IEEE Conf. Comput. Vis. Pattern Recognit., pages 1–8, 2007.
-  P. Lindstrom. Triangulation made easy. In IEEE Conf. Comput. Vis. Pattern Recognit., pages 1554–1561, 2010.
-  H. C. Longuet-Higgins. A computer algorithm for reconstructing a scene from two projections. Nature, 293(5828):133–135, 1981.
-  F. Lu and R. Hartley. A fast optimal algorithm for triangulation. In Asian Conf. Comput. Vis., pages 279–288, 2007.
-  P. Moulon, P. Monasse, and R. Marlet. Global fusion of relative motions for robust, accurate and scalable structure from motion. In IEEE Int. Conf. Comput. Vis., pages 3248–3255, 2013.
-  R. Mur-Artal and J. Tardos. Probabilistic semi-dense mapping from highly accurate feature-based monocular SLAM. In Proc. Robot.: Sci. Syst., 2015.
R. Mur-Artal and J. D. Tardós.
ORB-SLAM2: an open-source SLAM system for monocular, stereo and RGB-D cameras.IEEE Trans. Robot., 33(5):1255–1262, 2017.
-  J. Oliensis. Exact two-image structure from motion. IEEE Trans. Pattern Anal. Mach. Intell., 24(12):1618–1633, 2002.
-  C. Olsson, A. Eriksson, and R. Hartley. Outlier removal using duality. In IEEE Conf. Comput. Vis. Pattern Recognit., pages 1450–1457, 2010.
-  T. Qin, P. Li, and S. Shen. VINS-Mono: A robust and versatile monocular visual-inertial state estimator. IEEE Trans. Robot., 34(4):1004–1020, 2018.
-  R. Raguram, O. Chum, M. Pollefeys, J. Matas, and J. Frahm. Usac: A universal framework for random sample consensus. IEEE Trans. Pattern Anal. Mach. Intell., 35(8):2022–2038, 2013.
-  S. Ramalingam, S. K. Lodha, and P. Sturm. A generic structure-from-motion framework. Comput. Vis. Image Underst., 103(3):218 – 228, 2006.
-  T. Sattler, B. Leibe, and L. Kobbelt. Fast image-based localization using direct 2D-to-3D matching. In Int. Conf. Comput. Vis., pages 667–674, 2011.
-  J. L. Schönberger and J. Frahm. Structure-from-Motion revisited. In IEEE Conf. Comput. Vis. Pattern Recognit., pages 4104–4113, 2016.
-  K. Sim and R. Hartley. Removing outliers using the norm. In IEEE Conf. Comput. Vis. Pattern Recognit., volume 1, pages 485–494, 2006.
-  N. Snavely, S. M. Seitz, and R. Szeliski. Photo tourism: exploring photo collections in 3D. ACM Trans. Graph., 25(3):835–846, 2006.
-  H. Stewénius, F. Schaffalitzky, and D. Nister. How hard is 3-view triangulation really? In IEEE Int. Conf. on Comput. Vis., volume 1, pages 686–693, 2005.
-  J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers. A benchmark for the evaluation of rgb-d slam systems. In IEEE/RSJ Int. Conf. Intell. Robots. Syst., 2012.
L. Svärm, O. Enqvist, M. Oskarsson, and F. Kahl.
Accurate localization and pose estimation for large 3D models.In IEEE Conf. Comput. Vis. Pattern Recognit., pages 532–539, 2014.
-  P. Torr, A. Zisserman, and S. Maybank. Robust detection of degenerate configurations while estimating the fundamental matrix. Comput. Vis. Image Underst., 71(3):312 – 333, 1998.
-  P. H. S. Torr and A. Zisserman. Robust computation and parameterization of multiple view relations. In IEEE Int. Conf. Comput. Vis., 1998.
-  K. Wolff, C. Kim, H. Zimmer, C. Schroers, M. Botsch, O. Sorkine-Hornung, and A. Sorkine-Hornung. Point cloud noise and outlier removal for image-based 3D reconstruction. In Int. Conf. 3D Vis., 2016.
-  K. Yang, W. Fang, Y. Zhao, and N. Deng. Iteratively reweighted midpoint method for fast multiple view triangulation. IEEE Robot. Autom. Lett., 4(2):708–715, 2019.