1 Introduction
The study of SfM has made rapid progress in recent years. It has achieved great success in small to medium scale scenes. However, reconstructing large scale datasets remains a big challenge in terms of both efficiency and robustness.
Since [DBLP:conf/iccv/AgarwalSSSS09] has achieved great success and has become a milestone, incremental approaches have been widely used in modern SfM applications[DBLP:conf/cvpr/SnavelySS08, DBLP:conf/3dim/Wu13, DBLP:conf/accv/MoulonMM12, DBLP:conf/mm/SweeneyHT15, DBLP:conf/cvpr/SchonbergerF16]. The geometric filtering combined with RANSAC [DBLP:journals/cacm/FischlerB81]
process can remove outliers effectively. Starting with a robust initial seed reconstruction, incremental SfM then adds camera one by one by PnP
[DBLP:conf/cvpr/KneipSS11, DBLP:journals/ijcv/LepetitMF09]. After cameras are registered successfully, an additional bundle adjustment step is used to optimize both poses and 3D points [DBLP:conf/iccvw/TriggsMHF99], which makes incremental SfM robust and accurate, however, also makes incremental SfM suffers when deals with large scale datasets. As the repetitive optimization by bundle adjustment [DBLP:conf/iccvw/TriggsMHF99] makes incremental SfM inefficient and the memory requirement also becomes a bottleneck. Besides, the manner of adding new views incrementally makes these kinds of approaches suffer from drift easily, though an additional retriangulation step is used [DBLP:conf/3dim/Wu13].Global SfM approaches [DBLP:conf/cvpr/Govindu01, DBLP:conf/eccv/WilsonS14, DBLP:journals/pami/CrandallOSH13, DBLP:conf/iccv/CuiT15, DBLP:conf/iccv/ChatterjeeG13, DBLP:conf/eccv/HavlenaTP10, DBLP:conf/eccv/WilsonBS16, Govindu2006Robustness, Govindu2004Lie, DBLP:conf/cvpr/OzyesilS15, Moulon2013Global] have advantages over incremental ones in efficiency. When all available relative motions are obtained, global approaches first obtain global rotations by solving the rotation averaging problem efficiently and robustly [DBLP:conf/cvpr/Govindu01, DBLP:conf/cvpr/Govindu04, DBLP:journals/ijcv/HartleyTDL13, DBLP:conf/cvpr/HartleyAT11, DBLP:conf/iccv/ChatterjeeG13, DBLP:journals/pami/ChatterjeeG18, DBLP:conf/cvpr/ErikssonOKC18, DBLP:conf/eccv/WilsonBS16]
. Then, global orientations and relative translations are used to estimate camera translations(or camera centers) by translation averaging
[DBLP:conf/eccv/WilsonS14, DBLP:conf/cvpr/OzyesilS15, DBLP:conf/eccv/GoldsteinHLVS16, DBLP:journals/corr/abs190100643]. With known camera poses, triangulation(retriangulation might be required) can be performed to obtain 3D points and then only once bundle adjustment step is required. Though global approaches are efficiency, the shortcomings are obviously: translation averaging is hard to solve, as relative translations only decode the direction of translation and the scale is unknown; outliers are still a headscratching problem for translation averaging, which is the main reason that prohibit the practical use of global SfM approaches.To overcome the inefficiency problem in incremental SfM while to remain the robustness of reconstruction at the same time, a natural idea is to do reconstruction in a divideandconquer manner. A pioneer work that proposed this idea is [DBLP:conf/accv/BhowmickPCGB14] where images are first partitioned by graph cut and each subreconstruction is stitched by similarity transformation. Then followed by [Zhu2017Parallel, DBLP:conf/cvpr/ZhuZZSFTQ18] where both the advantages of incremental and global approaches are utilized in each subreconstruction. However, both these divideandconquer approaches are more focused on the local reconstructions and their pipelines are lack of global consideration, which may lead to the failure of SfM.
Inspired by these previous outstanding divideandconquer work [DBLP:conf/accv/BhowmickPCGB14, DBLP:conf/3dim/SweeneyFHT16, Zhu2017Parallel, DBLP:conf/cvpr/ZhuZZSFTQ18], we solve large scale SfM in a parallel mode while the whole pipeline is designed with a unified framework based on graph theory. The proposed novelty framework starts from a global perspective where the images clustering step is designed for both robust local reconstruction and final subreconstructions merging step, in which each cluster is deemed as a node inside a graph. And the subreconstructions merging step can further utilize the cluster graph structure to obtain robust fusing results. More specifically, first, images are divided into clusters with no overlap and each cluster is a graph node. Second, lost edges are collected and used to construct a maximum spanning tree(MaxST). Then, these lost edges are added along the MaxST to construct overlapped images and enhance the connections between clusters. Third, local SfM solvers are executed in parallel or distributed mode. At last, after all local SfM jobs finish, a novel subreconstructions merging algorithm is proposed for clusters registering. The most accurate similarity transformations are selected within a minimum spanning tree(MinST) and a minimum height tree (MHT) is constructed to find a suitable reference frame and suppress the accumulated error.
Our contributions are mainly three folds:

We proposed an robust image clustering algorithm, where images are clustered into groups of suitable size with overlap, the connectivity is enhanced with the help of an MaxST.

We proposed a novel graphbased submodel merging algorithm, where MinST is constructed to find accurate similarity transformations, and MHT is constructed to avoid error accumulation during the merging process.

The time complexity is linearly related to the number of images, while most stateoftheart algorithms are quadratic.
2 Related Work
Some exciting work in large scale reconstruction is hierarchical SfM approaches [Farenzena2009Structure, Gherardi2010Improving, Toldo2015Hierarchical, DBLP:conf/3dim/NiD12, DBLP:journals/cviu/ChenCLSW17]. These approaches take each image as a leaf node. Point clouds and camera poses are merged from bottom to top. The principle of "smallest first" is adopted to produce a balanced dendrogram, which makes hierarchical approaches insensitive to initialization and drift error. However, due to insufficient feature matching [DBLP:journals/ijcv/Lowe04, DBLP:journals/pr/MaJJG19], the reconstructed scenes tend to lose scene details and become incomplete. Besides, the quality of reconstructions might be ruined by the selection of similar image pairs.
Some earlier work tries to solve large scale SfM via multicores [DBLP:conf/iccv/AgarwalSSSS09], or to reduce the burden of exhaustive pairwise matching by building skeletal graphs [DBLP:conf/cvpr/SnavelySS08]. Bhowmick [DBLP:conf/accv/BhowmickPCGB14] tried to solve large scale SfM in divideandconquer manner, and graph cut [DBLP:journals/pami/DhillonGK07, DBLP:journals/pami/ShiM00] was adopted to do data partition. After all subreconstructions complete, additional cameras are registered in each subreconstruction to construct overlapping areas and then to fuse them. It was then improved in [DBLP:conf/3dim/SweeneyFHT16] to cluster data set and merges each cluster by a distributed camera model [DBLP:conf/eccv/SweeneyFHT14, DBLP:conf/3dim/SweeneyFHT16]. However, both [DBLP:conf/3dim/SweeneyFHT16, DBLP:conf/accv/BhowmickPCGB14] either took no good consideration of the graph clustering strategy or neglected a careful design of clustering and merging algorithm, which makes reconstruction fragile and suffers from the drifting problem. Moreover, the loss of connections between different components makes the reconstruction fragile. Besides, the similarity score that is used as the weight in graph partition reduces the reliability of the result. One drawback that should be noticed is that the incremental merging process suffers from drifted errors, as well as traditional incremental approaches.
Follow the work of Bhowmick [DBLP:conf/accv/BhowmickPCGB14], [Zhu2017Parallel, DBLP:conf/cvpr/ZhuZZSFTQ18] augment the graph cut process in [DBLP:conf/accv/BhowmickPCGB14, DBLP:conf/3dim/SweeneyFHT16] by a two steps process: binary graph cut and graph expansion. In their work, the graph cut step and graph expansion step alternated and then converged when both size constraint and completeness constraint are satisfied. Then, components are registered by global motion averaging [Zhu2017Parallel]. However, the translation averaging at the cluster level still suffers from outliers and may lead to disconnected models. This work was further improved in [DBLP:conf/cvpr/ZhuZZSFTQ18], where it adopted the clusters registering approach in [DBLP:conf/accv/BhowmickPCGB14], and then camera poses were divided into intracameras and intercameras for motion averaging, which can improve the convergence rate of final bundle adjustment [DBLP:conf/iccvw/TriggsMHF99, DBLP:conf/cvpr/ErikssonBCI16, DBLP:conf/iccvw/RamamurthyLAPV17, DBLP:conf/iccv/ZhangZFQ17].
3 GraphBased Structure from Motion
To deal with large scale datasets, we adopt the divideandconquer strategy that is similar to [DBLP:conf/accv/BhowmickPCGB14, Zhu2017Parallel]. For the sake of completeness and effeciency of reconstruction, we propose to use a unified graph framework to solve the images clustering and subreconstructions merging problem. The pipeline of our SfM algorithm is shown in Fig.1. Firstly, we extract features and use them for matching. Epipolar geometries are estimated to filter matching outliers. After feature matching, we use our proposed images clustering algorithm to divide images into different groups. Then, clusters can be reconstructed by local SfM in parallel. After all local reconstructions are merged with our graphbased merging algorithm, a further retriangulation step and bundle adjustment can be performed alternatively. We describe more details of our algorithm in the following subsections.
3.1 Images Clustering
We aim to group images into clusters, each cluster is under the memory limitation of the computer. Besides, each cluster should be reconstructed as accurately as possible, and not be influenced largely by the loss of geometric constraints. In this section, we share a simple but quite effective approach for a two steps image clustering: (1) Graph cutting. (2) Images expansion based on the maximum spanning tree. Both the two steps are based on the intuitive graph theory. Besides, we utilize two conditions proposed in [Zhu2017Parallel] to constraint the clustering steps: size constraint and completeness constraint. Size constraint gives the upper bound of images in each cluster. Completeness constraint is defined as . Unlike the image clustering algorithm proposed in [Zhu2017Parallel], which alternates between graph cut and graph expansion, we just perform once graph cut and once image expansion. And we claim the novelty of our expansion step is using a MaxST to assist the final fusing step.
3.1.1 Images Clustering
In graph cutting step, each image is deemed as a graph node, and each edge represents the connection between images. In the case of SfM, it can be represented by the results of two view geometries. The weight of edges is the number of matches after geometric filtering. To solve this problem, the connection between images can be deemed as edges inside a graph. Suppose each camera is a graph node, the connection between two cameras can be deemed as a weighted edge(Here we referred to as image edges). Consider the size constraint, each cluster should have a similar size. The image clustering problem can be solved by graph cut [DBLP:journals/pami/DhillonGK07, DBLP:journals/pami/ShiM00]. To enhance the connection of clusters and to align them together, an additional expansion step should be executed. In our case, we need to expand these independent clusters with some common images(we referred to as overlapping area), and then to compute similarity transformations to fuse them together. As the iterative approach of Zhu [Zhu2017Parallel] is timeconsuming, we proposed our one step approach in expansion procedure in the next subsection.
3.1.2 Images Expansion
In the image expansion step, we generalize the graph into clusters level. Each cluster represents a node, the edges between clusters are the lost edges after graph cut (Here we refer as cluster edges). First, we collect all the lost edges for cluster pairs, where is the number of clusters. Then, we construct a graph, the weight of the cluster edge is the number of lost image edges between two clusters. Intuitively, if there are more image edges are lost inside pairwise clusters, we prefer to construct connections for them to avoid the loss of information. With that in mind, once we obtain the cluster graph, a MaxST is constructed to induce the expansion step. We gather the image edges from the MaxST and sort them by descending order. We then add the lost image edges into clusters where completeness constraint is not satisfied, and only the top edges are added. At last, we check all clusters and collect them together if the completeness constraint of any of them is not satisfied. For cluster edges that are not contained in the MaxST, we select them randomly and add the image edges into these clusters in a similar way.
The procedure of our images clustering algorithm is shown in Fig.2. In Fig. 2(a), The images graph is first grouped by using graph cut algorithm, where edges with weak connections tend to be removed. In Fig. 2(b), The cluster graph after graph cutting, where nodes are clusters and edges are lost edges in images graph, the number of lost edges are edge weights. In Fig. 2(c), the solid lines represent the edges of a constructed maximum spanning tree. The dotted line could be added to enhance the connectivity of clusters. Fig. 2(d) shows the final expanded image clusters. The complete images clustering algorithm is given in Alg. 1.
3.2 Graphbased Local Reconstructions Merging
After image clustering, each cluster can be reconstructed using a local SfM approach. Due to the robustness to outliers, we choose incremental SfM. As the reconstructed images are bounded below a threshold, the drift problem is alleviated. When all clusters are reconstructed, we need a final step to stitch them, as each cluster has its local coordinate frame.
To construct a robust merging algorithm, we consider three main problems:

A cluster should be selected as the reference frame, which we referred to as anchor node.

The merging step from other clusters to the anchor node should be as accurate as possible.

As there may not exist overlap between anchor node and some other clusters, we have to find a path to merge them into an anchor node. Due to the accumulated errors, the path of each cluster to the anchor node shouldn’t be too long.
To deal with the above problem, we construct a graph on the cluster level. The algorithm is composed of three main steps: (1) Cluster graph initialization. (2) Anchor node searching. (3) Path computation and simplification. For cluster graph initialization, we first find the common cameras between pairwise clusters, and compute the similarity transformations. Then we build a minimum spanning tree (MST) to select the most accurate edges. We found the anchor node by dealing with a minimum height tree (MHT) [DBLP:journals/dam/LaberN04] problem. We first show how the problem can be constructed into an MinST problem.
3.2.1 Pairwise Similarity Transformation
We have discussed how to construct overlapping areas in Sec. 3.1, we further utilize the overlapped information to compute the pairwise similarity transformation. Given correspondences of camera poses, i.e., and , we first estimate the relative scale. With relative scale known, the similarity estimation thus degenerated to euclidean estimation.
Relative Scale Estimation
To estimate relative scale, we need at least two points correspondences, and , we can estimate the relative scale by
(1) 
As there may exists outliers, we choose as
(2) 
Euclidean Transformation Estimation
When the relative scale is known, the similarity transformation degenerates to euclidean estimation. That is, we only need to estimate the relative rotation and relative translation. Suppose a 3D point is located in global coordinate frame, and is located in local coordinate frame by two euclidean transformation respectively. Then we have
(3) 
We can further obtain
(4) 
Then, the relative transformation is
(5) 
Because cluster and are up to a scale , we should reformulate the relative transformation as
(6) 
where and are camera centers in cluster respectively. To handle the existence of outliers, we combined the euclidean estimation with RANSAC.
3.2.2 Cluster Graph Initialization
In our approach, each cluster is deemed as a graph node, and the edges connect the nodes sharing some common cameras. Assume that there are clusters of cluster graph
, and we denote the probability of obtaining a good transformation from the cluster pair
as . Consequently, the probability that all edges in a spanning tree can be reconstructed is approximated as(7) 
where and are two clusters associated with the th edge of . can be considered as the probability that a global 3D reconstruction can be reached provided all spanning pairs are correctly reconstructed. Then we try to maximize the probability defined in Equ.(7). This is equivalent to minimizing the cost function
(8) 
To solve for the optimal spanning tree, we define the weight of an edge connecting clusters and as
(9) 
Now the problem of maximizing the joint probability is converted to the problem of finding a MinST in an undirected graph. Note that in MinST computing, the concrete value of the edge weights does not matter but the order of them does. That is, A reasonably comparable strengths of connections between clusters, instead of an accurate estimate of , is sufficient to help generate a good spanning tree. This observation leads us to the following residualerror weight definition scheme.
Residual Error
As a reliable measure of the goodness of cluster merging, we use the Mean Square Distance (MSD) to help define the edge weight. The Mean Square Error (MSE) from cluster to cluster is defined as:
(10) 
where is the similarity transformation from cluster to cluster , is the th common points in cluster . Equ.(10) describes the transformation error from cluster to cluster . To convert MSE to a symmetric metric, we use the maximum of mse to define the MSD:
(11) 
Then the edge weight between vertices and in is defined .
3.2.3 Miminum Height Tree Construction
After computing all the weights, the graph initialization process has been completed. Then, we can construct an MinST by Kruskal algorithm to select the most accurate similarity transformations. After finding an MinST, We need to find a base node as the reference of the global alignment of all clusters in the MinST. We impose restrictions on the selection of the base node: (1) The base node should be suitably large. (2) The path from the other nodes to the base node shouldn’t be too long. The first constraint is considered for efficiency. The second constraint is used to avoid error accumulation. Taking a similar idea of Minimum Height Tree (MHT) in [DBLP:journals/dam/LaberN04], We convert the problem of finding the base node into an MHT problem. We first introduce the concept of MHT.
Definition 1.
For an undirected graph with tree characteristics, we can choose any node as the root. The resulting graph is then a rooted tree. Among all possible rooted trees, those with minimum height are called minimum height trees (MHTs).
We solve the MHT problem by merging the leaf nodes layer by layer. At each layer, we collect all the leaf nodes and merged them into their neighbors. At last, there may be two or one nodes left. If there are two nodes left, then we choose the node that has a larger size as the base node. If there is only one node left, then the node is the base node.
The merging process is depicted by Fig.3. The advantage of using algorithm to find the base node is depicted in Fig.4. Owing to the robustness of our algorithm, which can find accurate similarity transformations and the edges which have large are filtered, we are able to merge all subreconstructions accurately. The full submodel merging algorithm is illustrated in Alg.2.
4 Experiments
In this section, we evaluate our GraphSfM on different kinds of datasets, including ambiguous datasets and large scale aerial datasets.
4.1 Experimental Environments
Our GraphSfM algorithm is implemented based on COLMAP [DBLP:conf/cvpr/SchonbergerF16], and is tested on the different kinds of datasets. All the experiments are performed on a PC with 4 cores Intel 7700 CPU and 32GB RAM. Besides, we use SIFT [DBLP:journals/ijcv/Lowe04] to extract feature points for all the evaluated SfM approaches.
4.2 Datasets Overview
To evaluate the robustness and efficiency of our algorithm, we first construct and collect some different kinds of datasets. The first kind of datasets are collected from 9 outdoor scenes, which include small scale and medium scale datasets, and the number of images is from 60 to 2248. The second kind of datasets are collected from public datasets, which include ourdoor scenes (Gerrard Hall, Person Hall, South Building) [DBLP:conf/cvpr/SchonbergerF16] and ambiguous scenes (Stadium and Heaven Temple) [DBLP:conf/eccv/ShenZFZQ16]. The last kind of datasets are 3 large scale aerial datasets, where the memory requirement and efficiency are challenges for traditional approaches.
4.3 Efficiency and Robustness Evaluation
We evaluated the efficiency of our algorithm over 2 stateoftheart incremental SfM approaches (TheiaSfM [DBLP:conf/mm/SweeneyHT15] and COLMAP [DBLP:conf/cvpr/SchonbergerF16]), and 2 stateoftheart global SfM approaches (1DSfM [DBLP:conf/eccv/WilsonS14] and LUD [DBLP:conf/cvpr/OzyesilS15]). For sake of fairness, our GraphSfM runs on one computer, though it can run in a distributed mode. The evaluation results are shown in Fig. 5 and table 1. It’s not surprising that the incremental approaches take more time for reconstruction than global approaches. As the dataset scale increases, the time that is taken by COLMAP [DBLP:conf/cvpr/SchonbergerF16] grows rapidly, due to the repetitive and timeconsuming bundle adjustment [DBLP:conf/iccvw/TriggsMHF99] step. Though our approach is a kind of incremental one, the scale of the images can be controlled to a constant size in each cluster. Thus the time of bundle adjustment can be highly reduced, and the time grows linearly as the number of images grows. Though TheiaSfM [DBLP:conf/mm/SweeneyHT15] is also an incremental SfM, it selects some good tracks [DBLP:journals/pr/CuiSH17] to perform bundle adjustment, which saves a lot of time but might become unstable in some cases. Besides, the time taken by TheiaSfM surpasses our GraphSfM when the scale of the image is over 2000. Table 1 gives more details of reconstruction results. It is obvious that our GraphSfM is as robust as COLMAP in terms of reconstructed cameras and is more accurate than other approaches in terms of reprojection errors. These facts illustrate the superior performance of our GraphSfM to handle large scale datasets. We emphasize that our algorithm just run on one computer and the reconstruction time could be reduced largely if we run it on more computers in distributed manner.
dataset  Images  COLMAP [DBLP:conf/cvpr/SchonbergerF16]  TheiaSfM [DBLP:conf/mm/SweeneyHT15]  1DSfM [DBLP:conf/eccv/WilsonS14]  LUD [DBLP:conf/cvpr/OzyesilS15]  Ours  
Err  Err  Err  Err  Err  
DS60  60  60  16387  0.478  26.22  60  8956  1.915  10.934  60  8979  1.923  1.317  60  8979  1.923  1.360  60  13923  0.456  24.48 
DS158  158  158  68989  0.420  170.34  158  39506  1.911  87.711  157  39527  1.918  7.758  158  39517  1.917  7.951  158  62020  0.438  168.48 
DS214  214  214  71518  0.512  122.64  138  6459  1.704  45.888  187  7080  1.539  4.248  162  5099  1.454  1.691  214  68882  0.487  121.56 
DS319  319  319  154702  0.498  529.14  204  11550  1.796  186.078  290  142967  1.821  17.525  270  13484  1.755  18.774  319  151437  0.473  482.4 
DS401  401  370  166503  0.584  568.68  305  23742  1.967  241.609  348  23081  1.886  18.891  316  22160  1.848  17.931  370  164495  0.552  562.74 
DS628  628  628  268616  0.394  562.74  628  133300  1.918  421.233  610  133146  1.908  34.803  628  133747  1.910  35.029  628  259333  0.388  605.58 
DS704  704  703  345677  0.575  1918.86  449  35659  1.861  603.85  641  42716  1.934  108.154  547  34296  1.908  97.8192  703  346394  0.546  1839.9 
DS999  999  980  419471  0.523  1918.86  733  40246  1.859  731.842  745  172864  1.769  77.798  611  31254  1.742  70.775  980  416512  0.504  2570.34 
DS2248  2248  2248  1609026  0.634  71,736  2248  187392  2.474  7255.700  2247  188102  2.475  667.585  2248  188134  2.474  694.736  2242  1445227  0.650  6,108.06 
4.4 Evaluation on Public Datasets
We evaluated our algorithm on several public datasets [DBLP:conf/cvpr/SchonbergerF16, DBLP:conf/eccv/ShenZFZQ16]. For these small scale datasets, we only run our GraphSfM on one computer. Some visual results are shown in Fig. 6 and statistics are given in table 2. COLMAP again is the most inefficient approach, and our approach is 1.2  3 times faster than COLMAP, though we only run it by one computer. TheiaSfM selects good tracks [DBLP:journals/pr/CuiSH17] for optimization and the two global approaches are most efficient. However, as shown in Fig. 6, we can see that both the global approaches failed in Person Hall and Guangzhou Stadium datasets, which shows global approaches are easily disturbed by outliers. As an incremental approach, TheiaSfM also failed in Person Hall and Guangzhou Stadium datasets. Our approach is as robust as COLMAP, however, more efficient than it.
Ambiguous Datasets
It’s a challenging work to reconstruct on ambiguous datasets for SfM approaches. Though feature matches are filtered by geometric constraints, there are still lots of wrong matches pass the verification step. As is shown in Fig.7, our GraphSfM shows advantages to traditional SfM approaches in this kind of datasets. Due to the image clustering step, some wrong edges are discarded in clusters, thus the individual reconstructions are not affected by the wrong matches. However, it is hard to detect wrong matches in traditional SfM approaches, especially in selfsimilar datasets or datasets with repeated structures, which is the major reason for the failure in ambiguous datasets.
dataset  Images  COLMAP [DBLP:conf/cvpr/SchonbergerF16]  TheiaSfM [DBLP:conf/mm/SweeneyHT15]  1DSfM [DBLP:conf/eccv/WilsonS14]  LUD [DBLP:conf/cvpr/OzyesilS15]  Ours  
Gerrard Hall  100  100  42795  303.066  100  50232  93.346  99  49083  15.821  100  44844  13.816  100  42274  0.010  114  3.848  118.68 
Person Hall  330  330  141629  1725.798  113  39101  157.416  42  6239  768.752  325  93386  107.558  330  140859  0.040  713.94  25.128  742.92 
South Building  128  128  61151  303.06  128  68812  155.844  128  436640  27.711  128  69110  34.695  128  58483  0.032  125.28  4.745  131.28 
Stadium  157  157  85723  418.74  30  6345  18.729  65  6319  5.555  77  4549  4.736  157  71605  0.026  403.86  16.167  421.62 
Heaven Temple  341  341  185750  8678.76  336  1201  46039  339  13356  40.959  340  14019  44.089  341  181583  0.044  2784.378  46.737  2856.258 
4.5 Evaluation on Large Scale Aerial Datasets
Our approach is also evaluated on large scale aerial datasets. We evaluated our algorithm both on one computer sequentially(each cluster is reconstructed one by one) and on three computers in parallel mode. The reconstruction results are given in table 3. Our algorithm can recover the same number of cameras as well as COLMAP. Besides, when running on a single computer, our approach is about 6 times faster than COLMAP; when running on three computers in distributed manner, our algorithm is about 17 times faster than COLMAP. Our algorithm can be further accelerated if we own more computer resources. TheiaSfM is slightly slower than our approach, however, the the number reconstructed 3D points is one magnitude less than our approach. Though 1DSfM and LUD are still the most efficient, their robustness would meet challenge in large scale aerial datasets.
dataset  Images  COLMAP [DBLP:conf/cvpr/SchonbergerF16]  TheiaSfM [DBLP:conf/mm/SweeneyHT15]  1DSfM [DBLP:conf/eccv/WilsonS14]  LUD [DBLP:conf/cvpr/OzyesilS15]  Ours  
Aerial5155  5155  5155  1798434  41823.27  4383  203942  3527.43  4591  243490  482.45  4723  278924  390.59  5155  1834875  2491.78  936.74 
Aerial7500  7500  7455  5184368  95007.59  5327  432347  6237.91  6264  478234  931.84  5934  467230  832.40  7455  4968142  5834.37  2166.59 
Aerial12306  12306  11259  3934391  146172  8347  478237  25783.12  8923  509543  4941.27  8534  489238  4589.73  11259  3916724  22663.86  8970.01 
5 Conclusion
In this article, we proposed a new SfM pipeline called GraphSfM, which is based on graph theory, and we designed a unified framework to solve large scale SfM tasks. Our two steps graph clustering algorithm enhances the connections of clusters, with the help of a MaxST. In the final fusing step, the construction of MinST and MHT allows us to pick the most accurate similarity transformations and to alleviate the error accumulation. Thus, our GraphSfM is highly efficient and robust to large scale datasets, and also show superiority in ambiguous datasets when compared with traditional stateoftheart SfM approaches. Moreover, GraphSfM can be implemented on a distributed system easily, thus the reconstruction is not limited by the scale of datasets.
Comments
There are no comments yet.