1 Introduction
Multiview triangulation refers to the problem of locating the 3D point given its projections in multiple views of known calibration and pose. It plays a fundamental role in many applications of computer vision, , structurefrommotion
[3, 34, 43], visual(inertial) odometry [12, 25, 29] and simultaneous localization and mapping [18, 36, 39].Under the assumption of perfect information (, image measurements, calibration and pose data without noise and outliers), triangulation simply amounts to intersecting the backprojected rays corresponding to the same point. In practice, however, noise and outliers are often inevitable. This makes the triangulation problem nontrivial. From a practical perspective, the following aspects should be taken into account when considering a triangulation method:
1. Is it applicable to multiple views? Some methods are developed specifically for two or three views (, twoview optimal methods [16, 22, 27, 31, 37], twoview midpoint methods [6, 16, 28], threeview optimal methods [7, 17, 26, 46]). For more than two or three views, these methods are not directly applicable unless they are incorporated in, for example, a RANSAC [11] framework.
2. Is it robust to outliers? Many existing multiview triangulation methods are sensitive to outliers, , the linear methods [16], the midpointbased methods [41, 52], the optimal methods [5, 19, 21, 33] and the optimal methods [4, 10, 15, 20]. To deal with outliers, various methods have been proposed, , outlier rejection using the norm [30, 38, 44], th median minimization [24] and iteratively reweighted least squares [2]. We refer to [23] for a comprehensive review of the robust triangulation methods.
3. Is it fast? While the aforementioned methods can handle a moderate number of outliers, they either fail or incur excessive computational cost at high outlier ratios [43]. For this reason, RANSAC is often recommended as a preprocessing step [30, 38, 44]. In [43], an efficient method using twoview RANSAC is proposed.
4. Does it estimate the uncertainty? To our knowledge, none of the aforementioned works provide the uncertainty estimate for the triangulated point. Knowing this uncertainty can be useful for point cloud denoising [51], robust localization [42, 48] and mapping [9, 35], among others.
In this work, we propose a robust and efficient method for uncertaintyaware multiview triangulation. Our contributions are summarized as follows:

[leftmargin=1.5em]

In Sect. 3.1, we propose an outlier rejection scheme using twoview RANSAC with the midpoint method. By reformulating the midpoint, we screen out the bad samples even before computing the midpoint. This improves the efficiency when the outlier ratio is high.

In Sect. 3.2, we revisit three existing local optimization methods, one of which is the GaussNewton method. For this method, we present an efficient computation of the Jacobian matrix. We closely evaluate the three methods with an iterative update of the inlier set.

In Sect. 3.3, we model the uncertainty of a triangulated point as a function of three factors: the number of (inlying) cameras, the mean reprojection error and the maximum parallax angle. We propose to learn this model from extensive simulations, so that at test time, we can estimate the uncertainty by interpolation. The estimated uncertainty can be used to control the 3D accuracy.
The proposed approach is detailed in Alg. 1. See the supplementary material for the nomenclature. To download our code, go to http://seonghunlee.github.io.
2 Preliminaries and Notation
We use bold lowercase letters for vectors, bold uppercase letters for matrices, and light letters for scalars. We denote the Hadamard product, division and square root by
, and , respectively. The Euclidean norm of a vector is denoted by , and the unit vector by . The angle between two vectors and is denoted by . We denote the vectorization of an matrix by and its inverse by .Consider a 3D point in the world reference frame and a perspective camera observing this point. In the camera reference frame, the 3D point is given by , where and are the rotation and translation that relate the reference frame to the world, is the extrinsic matrix, and is the homogeneous coordinates of . In the world frame, the camera position is given by . Let be the homogeneous pixel coordinates of the point and the camera calibration matrix. Then, the normalized image coordinates are obtained by .
Let be the set of all views in which the point is observed. The aim of multiview triangulation is to find the best estimate of given that noisy , , and are known for all . Once we have the estimate , the 3D error is given by , and the 2D error (aka the reprojection error) is given by
(1) 
where is the reprojection of in . To compute compactly, we define the following matrices:
(2)  
(3)  
(4)  
(5)  
(6)  
(7)  
(8)  
(9) 
where is the element of at the th row and th column. Then, can be obtained as follows:
(10) 
We provide the derivation in the supplementary material. Note that , and are independent of the point, and thus can be precomputed for efficiency.
We define the positive zaxis of the camera as the forward direction. This means that if the th element of is negative, the point is behind the camera , violating the cheirality [14]. Hence, given the estimated point , the corresponding set of inliers is obtained by
(11) 
where indicates the th element, and is the inlier threshold. We denote the number of elements of by . We define the maximum parallax angle of as follows:
(12)  
(13) 
where and are the two corresponding unit rays from camera and expressed in the world frame, ,
(14) 
3 Method
3.1 Fast TwoView RANSAC for Outlier Rejection
To obtain the initial solution and inlier set, we perform twoview RANSAC as in [43]. Our method has two notable differences to [43]: First, instead of the DLT method [16], we use the midpoint method [6, 16], which is faster and as accurate unless the parallax is very low [16, 28]. Second, we prescreen the samples before performing the twoview triangulation. Our method consists of the following steps:
1. Check the normalized epipolar error.
Let camera and be the twoview sample.
The normalized epipolar error [13] of the sample is defined as
(15) 
where is the unit vector of and , are the corresponding rays in the world frame given by (14). If and are both inliers, then must be small [32].
2. Check the parallax angle.
The raw parallax [28] defined as
is a rough estimate of the parallax angle if and are both inliers.
If is too small, the triangulation is inaccurate [28].
If it is too large, the sample most likely contains an outlier, because such a point is rarely matched in practice due to a severe viewpoint change.
We check from its cosine:
(16) 
3. Check additional degeneracy.
Likewise, if or is too small, the epipolar geometry degenerates (see Fig. 1).
To avoid degeneracy, we also check these angles from their cosines:
(17) 
4. Check the depths of the midpoint anchors.
Let and be the depths of the midpoint anchors (see Fig. 1).
In the supplementary material, we show that
(18) 
where
(19)  
(20) 
Since , we check the signs of and to ensure that and are both positive.
5. Evaluate the midpoint w.r.t. the two views.
Only when the twoview sample passes all of the aforementioned checks, we compute the midpoint:
(21) 
Then, we check the cheirality and reprojection errors in the two views. The entire procedure is detailed in Alg. 2.
Each midpoint from the twoview samples becomes a hypothesis for and is scored based on its reprojection error and cheirality. Specifically, we use the approach of [50] and find the hypothesis that minimizes the following cost:
3.2 Iterative Local Optimization
Once we have the initial triangulation result and the inlier set from the twoview RANSAC, we perform local optimization for refinement. Our approach is similar to [8], except that we perform the optimization only at the end of RANSAC. We compare three optimization methods:

DLT and LinLS [16]: These two linear methods minimize the algebraic errors in closed form. For the formal descriptions, we refer to [16]. They were originally developed for twoview triangulation, but they can be easily extended to multiple views. We construct the linear system with only the inliers, solve it and update the inlier set. This is repeated until the inlier set converges.

GN: This nonlinear method minimizes the geometric errors using the GaussNewton algorithm. After each update of the solution, we update the inlier set.
The GN method requires the computation of the Jacobian matrix in each iteration. In the following, we present an efficient method for computing . Recall that we are minimizing , which we know from (1) is equal to , where . This means that we can obtain by stacking
(23) 
for all . We now define the following vectors:
(24)  
(25)  
(26)  
(27)  
(28)  
(29)  
(30)  
(31)  
(32) 
where and respectively indicate the elements of and at the th row and th column, and indicate the th element of . Then, we can rewrite (23) as
(33)  
(34) 
We provide the derivation in the supplementary material. Since and can be precomputed independently of the point, the Jacobian can computed more efficiently. Alg. 3 summarizes the GN method.
3.3 Practical Uncertainty Estimation
We model the uncertainty of the triangulated point as a function of three factors: the number of inlying views (), the mean reprojection error in those views () and the maximum parallax angle () defined by (12).
To this end, we run a large number of simulations in various settings and aggregate the 3D errors for each different range of factors (see Fig. 2). We then store these data on a 3D regular grid that maps to the uncertainty . At test time, we estimate the uncertainty by performing trilinear interpolation on this grid.
We point out two things in our implementation: First, to reduce the small sample bias in , we perform monotone smoothing that enforces to increase with and decreases with and . The smoothing method is described and demonstrated in the supplementary material. Second, we limit the number of pairs we evaluate for computing in (13). This curbs the computational cost when is very large. Alg. 4 summarizes the procedure.
4 Results
4.1 Uncertainty Estimation
To find out how the different factors impact the 3D accuracy of triangulation, we run a large number of simulations in various settings configured by the following parameters:

: number of cameras observing the point.

: distance between the groundtruth point and the origin.

: std. dev. of Gaussian noise in the image coordinates.

: number of independent simulation runs for each configuration .
The parameter values are specified in the supplementary material. The simulations are generated as follows: We create cameras, of those randomly located inside a sphere of unit diameter at the origin. We place one of the two remaining cameras at a random point on the sphere’s surface and the other at its antipode. This ensures that the geometric span of the cameras is equal to one unit. The size and the focal length of the images are set to and pixel, respectively, the same as those of [47]. Next, we create a point at and orient the cameras randomly until the point is visible in all images. Then, we add the image noise of to perturb the image coordinates.
For triangulation, we initialize the point using the DLT method and refine it using the GN method. In this experiment, we assume that all points are always inliers, so we do not update the inlier set during the optimization. Fig. 2 shows the 3D error distribution with respect to different numbers of cameras (), mean reprojection errors () and maximum parallax angle (). In general, we observe that the 3D accuracy improves with more cameras, smaller and larger . However, this effect diminishes past a certain level. For example, the difference between the 30 and 50 cameras is much smaller than the difference between the 2 and 3. Also, when is sufficiently large, the 3D accuracy is less sensitive to the change in , and .
Fig. 2 clearly indicates that we must take into account all these three factors when estimating the 3D uncertainty of a triangulated point. Marginalizing any one of them would reduce the accuracy. This observation agrees with our intuition, as each factor conveys important independent information about the given triangulation problem.
4.2 Triangulation Performance
We evaluate the performance of our method on synthetic data. The simulation is configured in a similar way as in the previous section. The difference is that we set , , pixel, thousand, and we perturb some of the measurements by more than 10 pixel, turning them into outliers. The outlier ratio is set to , , , and percent. Varying and the outlier ratio results in configurations, so in total, this amounts to two million unique triangulation problems.
On this dataset, we compare our method against the state of the art ([43] by Schönberger and Frahm) with and without the local optimization (DLT, LinLS and GN). In Alg. 1, we set , pix, , pix, , and . Fig. 3 shows the results. On average, ours and [43] perform similarly (but ours is faster, as will be shown later). For both methods, the local optimization substantially improves the 2D and 3D accuracy. Thanks to the iterative update of the inlier set, we also see a significant gain in recall. Among the optimization methods, DLT and GN show similar performance in all criteria, while LinLS exhibits larger 3D error than the other two. We provide a closer comparison between DLT and GN in the next section.
In general, when the point is far and the outlier ratio is high, the performance degrades for all methods. At any fixed outlier ratio, we observe that the 3D error tends to grow with the point distance. However, the same cannot be said for the 2D error. This is because given sufficient parallax, the 2D accuracy is mostly influenced by the image noise statistics, rather than the geometric configurations.
We also evaluate the accuracy after pruning the most uncertain points using our method (Sect. 4.1). In Fig. 4, we plot the error histograms of the points with different levels of the estimated 3D uncertainty (). It shows that with a smaller threshold on , we get to prune more points with larger 3D error. Fig. 5 shows the cumulative 3D error plots. It illustrates that thresholding on gives us some control over the upper bound of the 3D error. As a result, we are able to trade off the number points for 3D accuracy by varying the threshold level. This is shown in Fig. 6.
To compare the timings, all methods are implemented in MATLAB and run on a laptop CPU (Intel i74810MQ, 2.8GHz). Tab. 1 provides the relative speed of our twoview RANSAC compared to [43]. It shows that ours is faster, especially when the point is far and the outlier ratio is high. This demonstrates the advantage of the early termination of twoview triangulation (Alg. 2). In Tab. 2, we present the timings of the local optimization and uncertainty estimation. We found that DLT is slightly faster than LinLS and almost twice faster than GN.
4.15, 4.15  4.43, 4.32  4.81, 4.40  5.90, 4.47  
4.61, 4.39  4.80, 4.45  4.78, 4.08  7.33, 4.79  
5.28, 4.56  5.64, 4.72  5.87, 4.38  10.4, 5.22  
7.46, 5.09  8.06, 5.30  9.13, 4.93  19.9, 6.45  
28.6, 7.64  31.8, 8.14  40.8, 8.86  72.6, 13.1  
DLT  LinLS  GN  Uncertainty Est.  
1.57  1.62  3.06  1.64  
1.18  1.27  2.39  1.44  
0.84  0.95  1.78  1.25  
0.57  0.64  1.17  1.04  
0.25  0.29  0.50  0.42 
4.3 DLT vs. GaussNewton Method
To compare the accuracy of DLT and GN more closely, we perform additional simulations in outlierfree scenarios. The simulation is set up in a similar way as in Section 4.1 (see the supplementary material for details).
In terms of 3D accuracy, we found that the two methods perform almost equally most of the time. The comparison is inconsistent only when the maximum parallax angle is very small (less than 6 deg or so). We show this result in the supplementary material.
As for the 2D accuracy, the difference is sometimes noticeable. Fig. 7 shows the mean and the maximum difference of 2D error. On average, GN offers less gain for more cameras, smaller noise and lower parallax. This explains why we could not see the difference between DLT and GN in Fig. 3. However, the bottom row of Fig. 7 reveals that GN sometimes provides a significant gain over DLT even when the average difference is small.
4.4 Results on Real Data
We evaluate our method on three real datasets: Dinosaur [1], Corridor [1] and Notre Dame [45]. We only consider the points that are visible in three or more views. In our algorithm, we use the same parameters as in Sect. 4.2 and discard the point that is visible in less than three views after RANSAC. Fig. 8 shows the 3D reconstruction results.
5 Conclusions
In this work, we presented a robust and efficient method for multiview triangulation and uncertainty estimation. We proposed several early termination criteria for twoview RANSAC using the midpoint method, and showed that it improves the efficiency when the outlier ratio is high. We also compared the three local optimization methods (DLT, LinLS and GN), and found that DLT and GN are similar (but better than LinLS) in terms of 3D accuracy, while GN is sometimes much more accurate than DLT in terms of 2D accuracy. Finally, we proposed a novel method to estimate the uncertainty of a triangulated point based on the number of (inlying) views, the mean reprojection error and the maximum parallax angle. We showed that the estimated uncertainty can be used to control the 3D accuracy. An extensive evaluation was performed on both synthetic and real data.
References
 [1] Oxford Multiview Datasets. http://www.robots.ox.ac.uk/~vgg/data/datamview.html.
 [2] K. Aftab and R. Hartley. Convergence of iteratively reweighted least squares to robust mestimators. In IEEE Winter Conf. Appl. Comput. Vis., pages 480–487, 2015.
 [3] S. Agarwal, Y. Furukawa, N. Snavely, I. Simon, B. Curless, S. M. Seitz, and R. Szeliski. Building rome in a day. Commun. ACM, 54(10):105–112, 2011.

[4]
S. Agarwal, N. Snavely, and S. M. Seitz.
Fast algorithms for problems in multiview geometry.
In
IEEE Conf. Comput. Vis. Pattern Recognit.
, pages 1–8, 2008.  [5] C. Aholt, S. Agarwal, and R. Thomas. A QCQP approach to triangulation. In Eur. Conf. Comput. Vis., pages 654–667, 2012.
 [6] P. A. Beardsley, A. Zisserman, and D. W. Murray. Sequential updating of projective and affine structure from motion. Int. J. Comput. Vis., 23(3):235–259, 1997.
 [7] M. Byröd, K. Josephson, and K. Åström. Fast optimal three view triangulation. In Asian Conf. Comput. Vis., pages 549–559, 2007.
 [8] O. Chum, J. Matas, and J. Kittler. Locally optimized RANSAC. In Pattern Recognition, pages 236–243, 2003.
 [9] A. Concha and J. Civera. DPPTAM: Dense piecewise planar tracking and mapping from a monocular sequence. In IEEE/RSJ Int. Conf. Intell. Robots. Syst., pages 5686–5693, 2015.
 [10] Z. Dai, Y. Wu, F. Zhang, and H. Wang. A novel fast method for problems in multiview geometry. In Eur. Conf. Comput. Vis., pages 116–129, 2012.
 [11] M. A. Fischler and R. C. Bolles. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM, 24(6):381–395, 1981.
 [12] C. Forster, Z. Zhang, M. Gassner, M. Werlberger, and D. Scaramuzza. SVO: semidirect visual odometry for monocular and multicamera systems. IEEE Trans. Robot., 33(2):249–265, 2017.
 [13] S. H. Lee and J. Civera. Geometric interpretations of the normalized epipolar error. CoRR, abs/2008.01254, 2020.
 [14] R. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge University Press, 2 edition, 2003.
 [15] R. I. Hartley and F. Schaffalitzky. minimization in geometric reconstruction problems. In IEEE Conf. Comput. Vis. Pattern Recognit., 2004.
 [16] R. I. Hartley and P. Sturm. Triangulation. Comput. Vis. Image Underst., 68(2):146–157, 1997.
 [17] J. Hedborg, A. Robinson, and M. Felsberg. Robust threeview triangulation done fast. In IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, pages 152–157, 2014.
 [18] D. C. Herrera, K. Kim, J. Kannala, K. Pulli, and J. Heikkilä. DTSLAM: Deferred triangulation for robust SLAM. In IEEE Int. Conf. 3D Vis., pages 609–616, 2014.
 [19] F. Kahl, S. Agarwal, M. K. Chandraker, D. Kriegman, and S. Belongie. Practical global optimization for multiview geometry. Int. J. Comput. Vis., 79(3):271–284, 2008.
 [20] F. Kahl and R. Hartley. Multipleview geometry under the norm. IEEE Trans. Pattern Anal. Mach. Intell., 30(9):1603–1617, 2008.
 [21] F. Kahl and D. Henrion. Globally optimal estimates for geometric reconstruction problems. Int. J. Comput. Vis., 74(1):3–15, 2007.
 [22] K. Kanatani, Y. Sugaya, and H. Niitsuma. Triangulation from two views revisited: HartleySturm vs. optimal correction. In Brit. Mach. Vis. Conf., pages 173–182, 2008.
 [23] L. Kang, L. Wu, and Y.H. Yang. Robust multiview triangulation via optimal inlier selection and 3D structure refinement. Pattern Recognition, 47(9):2974 – 2992, 2014.
 [24] Q. Ke and T. Kanade. Quasiconvex optimization for robust geometric reconstruction. IEEE Trans. Pattern Anal. Mach. Intell., 29(10):1834–1847, 2007.
 [25] G. Klein and D. Murray. Parallel tracking and mapping for small AR workspaces. In IEEE/ACM Int. Symp. Mixed Augmented Reality, 2007.
 [26] Z. Kukelova, T. Pajdla, and M. Bujnak. Fast and stable algebraic solution to threeview triangulation. In IEEE Int. Conf. 3D Vis., pages 326–333, 2013.
 [27] S. H. Lee and J. Civera. Closedform optimal twoview triangulation based on angular errors. In IEEE Int. Conf. Comput. Vis., pages 2681–2689, 2019.
 [28] S. H. Lee and J. Civera. Triangulation: Why optimize? In Brit. Mach. Vis. Conf., 2019.
 [29] S. Leutenegger, S. Lynen, M. Bosse, R. Siegwart, and P. Furgale. Keyframebased visual–inertial odometry using nonlinear optimization. Int. J. Robot. Res., 34(3):314–334, 2015.
 [30] H. Li. A practical algorithm for triangulation with outliers. In IEEE Conf. Comput. Vis. Pattern Recognit., pages 1–8, 2007.
 [31] P. Lindstrom. Triangulation made easy. In IEEE Conf. Comput. Vis. Pattern Recognit., pages 1554–1561, 2010.
 [32] H. C. LonguetHiggins. A computer algorithm for reconstructing a scene from two projections. Nature, 293(5828):133–135, 1981.
 [33] F. Lu and R. Hartley. A fast optimal algorithm for triangulation. In Asian Conf. Comput. Vis., pages 279–288, 2007.
 [34] P. Moulon, P. Monasse, and R. Marlet. Global fusion of relative motions for robust, accurate and scalable structure from motion. In IEEE Int. Conf. Comput. Vis., pages 3248–3255, 2013.
 [35] R. MurArtal and J. Tardos. Probabilistic semidense mapping from highly accurate featurebased monocular SLAM. In Proc. Robot.: Sci. Syst., 2015.

[36]
R. MurArtal and J. D. Tardós.
ORBSLAM2: an opensource SLAM system for monocular, stereo and RGBD cameras.
IEEE Trans. Robot., 33(5):1255–1262, 2017.  [37] J. Oliensis. Exact twoimage structure from motion. IEEE Trans. Pattern Anal. Mach. Intell., 24(12):1618–1633, 2002.
 [38] C. Olsson, A. Eriksson, and R. Hartley. Outlier removal using duality. In IEEE Conf. Comput. Vis. Pattern Recognit., pages 1450–1457, 2010.
 [39] T. Qin, P. Li, and S. Shen. VINSMono: A robust and versatile monocular visualinertial state estimator. IEEE Trans. Robot., 34(4):1004–1020, 2018.
 [40] R. Raguram, O. Chum, M. Pollefeys, J. Matas, and J. Frahm. Usac: A universal framework for random sample consensus. IEEE Trans. Pattern Anal. Mach. Intell., 35(8):2022–2038, 2013.
 [41] S. Ramalingam, S. K. Lodha, and P. Sturm. A generic structurefrommotion framework. Comput. Vis. Image Underst., 103(3):218 – 228, 2006.
 [42] T. Sattler, B. Leibe, and L. Kobbelt. Fast imagebased localization using direct 2Dto3D matching. In Int. Conf. Comput. Vis., pages 667–674, 2011.
 [43] J. L. Schönberger and J. Frahm. StructurefromMotion revisited. In IEEE Conf. Comput. Vis. Pattern Recognit., pages 4104–4113, 2016.
 [44] K. Sim and R. Hartley. Removing outliers using the norm. In IEEE Conf. Comput. Vis. Pattern Recognit., volume 1, pages 485–494, 2006.
 [45] N. Snavely, S. M. Seitz, and R. Szeliski. Photo tourism: exploring photo collections in 3D. ACM Trans. Graph., 25(3):835–846, 2006.
 [46] H. Stewénius, F. Schaffalitzky, and D. Nister. How hard is 3view triangulation really? In IEEE Int. Conf. on Comput. Vis., volume 1, pages 686–693, 2005.
 [47] J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers. A benchmark for the evaluation of rgbd slam systems. In IEEE/RSJ Int. Conf. Intell. Robots. Syst., 2012.

[48]
L. Svärm, O. Enqvist, M. Oskarsson, and F. Kahl.
Accurate localization and pose estimation for large 3D models.
In IEEE Conf. Comput. Vis. Pattern Recognit., pages 532–539, 2014.  [49] P. Torr, A. Zisserman, and S. Maybank. Robust detection of degenerate configurations while estimating the fundamental matrix. Comput. Vis. Image Underst., 71(3):312 – 333, 1998.
 [50] P. H. S. Torr and A. Zisserman. Robust computation and parameterization of multiple view relations. In IEEE Int. Conf. Comput. Vis., 1998.
 [51] K. Wolff, C. Kim, H. Zimmer, C. Schroers, M. Botsch, O. SorkineHornung, and A. SorkineHornung. Point cloud noise and outlier removal for imagebased 3D reconstruction. In Int. Conf. 3D Vis., 2016.
 [52] K. Yang, W. Fang, Y. Zhao, and N. Deng. Iteratively reweighted midpoint method for fast multiple view triangulation. IEEE Robot. Autom. Lett., 4(2):708–715, 2019.
Comments
There are no comments yet.