Graph-Based Parallel Large Scale Structure from Motion

12/23/2019 ∙ by Yu Chen, et al. ∙ Peking University 27

While Structure from Motion (SfM) achieves great success in 3D reconstruction, it still meets challenges on large scale scenes. In this work, large scale SfM is deemed as a graph problem, and we tackle it in a divide-and-conquer manner. Firstly, the images clustering algorithm divides images into clusters with strong connectivity, leading to robust local reconstructions. Then followed with an image expansion step, the connection and completeness of scenes are enhanced by expanding along with a maximum spanning tree. After local reconstructions, we construct a minimum spanning tree (MinST) to find accurate similarity transformations. Then the MinST is transformed into a Minimum Height Tree (MHT) to find a proper anchor node and is further utilized to prevent error accumulation. When evaluated on different kinds of datasets, our approach shows superiority over the state-of-the-art in accuracy and efficiency. Our algorithm is open-sourced at



There are no comments yet.


page 9

page 10

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The study of SfM has made rapid progress in recent years. It has achieved great success in small to medium scale scenes. However, reconstructing large scale datasets remains a big challenge in terms of both efficiency and robustness.

Since [DBLP:conf/iccv/AgarwalSSSS09] has achieved great success and has become a milestone, incremental approaches have been widely used in modern SfM applications[DBLP:conf/cvpr/SnavelySS08, DBLP:conf/3dim/Wu13, DBLP:conf/accv/MoulonMM12, DBLP:conf/mm/SweeneyHT15, DBLP:conf/cvpr/SchonbergerF16]. The geometric filtering combined with RANSAC [DBLP:journals/cacm/FischlerB81]

process can remove outliers effectively. Starting with a robust initial seed reconstruction, incremental SfM then adds camera one by one by PnP

[DBLP:conf/cvpr/KneipSS11, DBLP:journals/ijcv/LepetitMF09]. After cameras are registered successfully, an additional bundle adjustment step is used to optimize both poses and 3D points [DBLP:conf/iccvw/TriggsMHF99], which makes incremental SfM robust and accurate, however, also makes incremental SfM suffers when deals with large scale datasets. As the repetitive optimization by bundle adjustment [DBLP:conf/iccvw/TriggsMHF99] makes incremental SfM inefficient and the memory requirement also becomes a bottleneck. Besides, the manner of adding new views incrementally makes these kinds of approaches suffer from drift easily, though an additional re-triangulation step is used [DBLP:conf/3dim/Wu13].

Global SfM approaches [DBLP:conf/cvpr/Govindu01, DBLP:conf/eccv/WilsonS14, DBLP:journals/pami/CrandallOSH13, DBLP:conf/iccv/CuiT15, DBLP:conf/iccv/ChatterjeeG13, DBLP:conf/eccv/HavlenaTP10, DBLP:conf/eccv/WilsonBS16, Govindu2006Robustness, Govindu2004Lie, DBLP:conf/cvpr/OzyesilS15, Moulon2013Global] have advantages over incremental ones in efficiency. When all available relative motions are obtained, global approaches first obtain global rotations by solving the rotation averaging problem efficiently and robustly [DBLP:conf/cvpr/Govindu01, DBLP:conf/cvpr/Govindu04, DBLP:journals/ijcv/HartleyTDL13, DBLP:conf/cvpr/HartleyAT11, DBLP:conf/iccv/ChatterjeeG13, DBLP:journals/pami/ChatterjeeG18, DBLP:conf/cvpr/ErikssonOKC18, DBLP:conf/eccv/WilsonBS16]

. Then, global orientations and relative translations are used to estimate camera translations(or camera centers) by translation averaging

[DBLP:conf/eccv/WilsonS14, DBLP:conf/cvpr/OzyesilS15, DBLP:conf/eccv/GoldsteinHLVS16, DBLP:journals/corr/abs-1901-00643]. With known camera poses, triangulation(re-triangulation might be required) can be performed to obtain 3D points and then only once bundle adjustment step is required. Though global approaches are efficiency, the shortcomings are obviously: translation averaging is hard to solve, as relative translations only decode the direction of translation and the scale is unknown; outliers are still a head-scratching problem for translation averaging, which is the main reason that prohibit the practical use of global SfM approaches.

To overcome the inefficiency problem in incremental SfM while to remain the robustness of reconstruction at the same time, a natural idea is to do reconstruction in a divide-and-conquer manner. A pioneer work that proposed this idea is [DBLP:conf/accv/BhowmickPCGB14] where images are first partitioned by graph cut and each sub-reconstruction is stitched by similarity transformation. Then followed by [Zhu2017Parallel, DBLP:conf/cvpr/ZhuZZSFTQ18] where both the advantages of incremental and global approaches are utilized in each sub-reconstruction. However, both these divide-and-conquer approaches are more focused on the local reconstructions and their pipelines are lack of global consideration, which may lead to the failure of SfM.

Inspired by these previous outstanding divide-and-conquer work [DBLP:conf/accv/BhowmickPCGB14, DBLP:conf/3dim/SweeneyFHT16, Zhu2017Parallel, DBLP:conf/cvpr/ZhuZZSFTQ18], we solve large scale SfM in a parallel mode while the whole pipeline is designed with a unified framework based on graph theory. The proposed novelty framework starts from a global perspective where the images clustering step is designed for both robust local reconstruction and final sub-reconstructions merging step, in which each cluster is deemed as a node inside a graph. And the sub-reconstructions merging step can further utilize the cluster graph structure to obtain robust fusing results. More specifically, first, images are divided into clusters with no overlap and each cluster is a graph node. Second, lost edges are collected and used to construct a maximum spanning tree(MaxST). Then, these lost edges are added along the MaxST to construct overlapped images and enhance the connections between clusters. Third, local SfM solvers are executed in parallel or distributed mode. At last, after all local SfM jobs finish, a novel sub-reconstructions merging algorithm is proposed for clusters registering. The most accurate similarity transformations are selected within a minimum spanning tree(MinST) and a minimum height tree (MHT) is constructed to find a suitable reference frame and suppress the accumulated error.

Our contributions are mainly three folds:

  • We proposed an robust image clustering algorithm, where images are clustered into groups of suitable size with overlap, the connectivity is enhanced with the help of an MaxST.

  • We proposed a novel graph-based sub-model merging algorithm, where MinST is constructed to find accurate similarity transformations, and MHT is constructed to avoid error accumulation during the merging process.

  • The time complexity is linearly related to the number of images, while most state-of-the-art algorithms are quadratic.

2 Related Work

Some exciting work in large scale reconstruction is hierarchical SfM approaches [Farenzena2009Structure, Gherardi2010Improving, Toldo2015Hierarchical, DBLP:conf/3dim/NiD12, DBLP:journals/cviu/ChenCLSW17]. These approaches take each image as a leaf node. Point clouds and camera poses are merged from bottom to top. The principle of "smallest first" is adopted to produce a balanced dendrogram, which makes hierarchical approaches insensitive to initialization and drift error. However, due to insufficient feature matching [DBLP:journals/ijcv/Lowe04, DBLP:journals/pr/MaJJG19], the reconstructed scenes tend to lose scene details and become incomplete. Besides, the quality of reconstructions might be ruined by the selection of similar image pairs.

Some earlier work tries to solve large scale SfM via multi-cores [DBLP:conf/iccv/AgarwalSSSS09], or to reduce the burden of exhaustive pairwise matching by building skeletal graphs [DBLP:conf/cvpr/SnavelySS08]. Bhowmick [DBLP:conf/accv/BhowmickPCGB14] tried to solve large scale SfM in divide-and-conquer manner, and graph cut [DBLP:journals/pami/DhillonGK07, DBLP:journals/pami/ShiM00] was adopted to do data partition. After all sub-reconstructions complete, additional cameras are registered in each sub-reconstruction to construct overlapping areas and then to fuse them. It was then improved in [DBLP:conf/3dim/SweeneyFHT16] to cluster data set and merges each cluster by a distributed camera model [DBLP:conf/eccv/SweeneyFHT14, DBLP:conf/3dim/SweeneyFHT16]. However, both [DBLP:conf/3dim/SweeneyFHT16, DBLP:conf/accv/BhowmickPCGB14] either took no good consideration of the graph clustering strategy or neglected a careful design of clustering and merging algorithm, which makes reconstruction fragile and suffers from the drifting problem. Moreover, the loss of connections between different components makes the reconstruction fragile. Besides, the similarity score that is used as the weight in graph partition reduces the reliability of the result. One drawback that should be noticed is that the incremental merging process suffers from drifted errors, as well as traditional incremental approaches.

Follow the work of Bhowmick [DBLP:conf/accv/BhowmickPCGB14], [Zhu2017Parallel, DBLP:conf/cvpr/ZhuZZSFTQ18] augment the graph cut process in [DBLP:conf/accv/BhowmickPCGB14, DBLP:conf/3dim/SweeneyFHT16] by a two steps process: binary graph cut and graph expansion. In their work, the graph cut step and graph expansion step alternated and then converged when both size constraint and completeness constraint are satisfied. Then, components are registered by global motion averaging [Zhu2017Parallel]. However, the translation averaging at the cluster level still suffers from outliers and may lead to disconnected models. This work was further improved in [DBLP:conf/cvpr/ZhuZZSFTQ18], where it adopted the clusters registering approach in [DBLP:conf/accv/BhowmickPCGB14], and then camera poses were divided into intra-cameras and inter-cameras for motion averaging, which can improve the convergence rate of final bundle adjustment [DBLP:conf/iccvw/TriggsMHF99, DBLP:conf/cvpr/ErikssonBCI16, DBLP:conf/iccvw/RamamurthyLAPV17, DBLP:conf/iccv/ZhangZFQ17].

3 Graph-Based Structure from Motion

To deal with large scale datasets, we adopt the divide-and-conquer strategy that is similar to [DBLP:conf/accv/BhowmickPCGB14, Zhu2017Parallel]. For the sake of completeness and effeciency of reconstruction, we propose to use a unified graph framework to solve the images clustering and sub-reconstructions merging problem. The pipeline of our SfM algorithm is shown in Fig.1. Firstly, we extract features and use them for matching. Epipolar geometries are estimated to filter matching outliers. After feature matching, we use our proposed images clustering algorithm to divide images into different groups. Then, clusters can be reconstructed by local SfM in parallel. After all local reconstructions are merged with our graph-based merging algorithm, a further re-triangulation step and bundle adjustment can be performed alternatively. We describe more details of our algorithm in the following subsections.

Figure 1: Our proposed SfM pipeline. After feature matching, we use our proposed images clustering algorithm to divide images into different groups. Then, clusters can be reconstructed by local SfM in parallel. After all local reconstructions are merged with our graph-based merging algorithm, a further re-triangulation step and bundle adjustment can be performed alternatively.

3.1 Images Clustering

We aim to group images into clusters, each cluster is under the memory limitation of the computer. Besides, each cluster should be reconstructed as accurately as possible, and not be influenced largely by the loss of geometric constraints. In this section, we share a simple but quite effective approach for a two steps image clustering: (1) Graph cutting. (2) Images expansion based on the maximum spanning tree. Both the two steps are based on the intuitive graph theory. Besides, we utilize two conditions proposed in [Zhu2017Parallel] to constraint the clustering steps: size constraint and completeness constraint. Size constraint gives the upper bound of images in each cluster. Completeness constraint is defined as . Unlike the image clustering algorithm proposed in [Zhu2017Parallel], which alternates between graph cut and graph expansion, we just perform once graph cut and once image expansion. And we claim the novelty of our expansion step is using a MaxST to assist the final fusing step.

3.1.1 Images Clustering

In graph cutting step, each image is deemed as a graph node, and each edge represents the connection between images. In the case of SfM, it can be represented by the results of two view geometries. The weight of edges is the number of matches after geometric filtering. To solve this problem, the connection between images can be deemed as edges inside a graph. Suppose each camera is a graph node, the connection between two cameras can be deemed as a weighted edge(Here we referred to as image edges). Consider the size constraint, each cluster should have a similar size. The image clustering problem can be solved by graph cut [DBLP:journals/pami/DhillonGK07, DBLP:journals/pami/ShiM00]. To enhance the connection of clusters and to align them together, an additional expansion step should be executed. In our case, we need to expand these independent clusters with some common images(we referred to as overlapping area), and then to compute similarity transformations to fuse them together. As the iterative approach of Zhu [Zhu2017Parallel] is time-consuming, we proposed our one step approach in expansion procedure in the next subsection.

3.1.2 Images Expansion

In the image expansion step, we generalize the graph into clusters level. Each cluster represents a node, the edges between clusters are the lost edges after graph cut (Here we refer as cluster edges). First, we collect all the lost edges for cluster pairs, where is the number of clusters. Then, we construct a graph, the weight of the cluster edge is the number of lost image edges between two clusters. Intuitively, if there are more image edges are lost inside pairwise clusters, we prefer to construct connections for them to avoid the loss of information. With that in mind, once we obtain the cluster graph, a MaxST is constructed to induce the expansion step. We gather the image edges from the MaxST and sort them by descending order. We then add the lost image edges into clusters where completeness constraint is not satisfied, and only the top- edges are added. At last, we check all clusters and collect them together if the completeness constraint of any of them is not satisfied. For cluster edges that are not contained in the MaxST, we select them randomly and add the image edges into these clusters in a similar way.

The procedure of our images clustering algorithm is shown in Fig.2. In Fig. 2(a), The images graph is first grouped by using graph cut algorithm, where edges with weak connections tend to be removed. In Fig. 2(b), The cluster graph after graph cutting, where nodes are clusters and edges are lost edges in images graph, the number of lost edges are edge weights. In Fig. 2(c), the solid lines represent the edges of a constructed maximum spanning tree. The dotted line could be added to enhance the connectivity of clusters. Fig. 2(d) shows the final expanded image clusters. The complete images clustering algorithm is given in Alg. 1.

Figure 2: The image clustering procedure. (a). The images graph is grouped by NCut, where edges with weak connections are tend to be removed. (b). The cluster graph after graph cutting, where nodes are clusters and edges are lost edges in images graph, the number of lost edges are edge weights. (c). The solid lines represent the edges of constructed maximum spanning tree. The dotted line could be added to enhance connectivity of clusters. (d) The final expanded images clusters.
1:an initial image graph , maximum number of cluster size , completeness ratio , number of overlapped images between two clusters , number of images .
2:image clusters with intersection
4: GraphPartition(), ,
5:Collect lost edges
6:Build cluster graph from
7:, ,
8:while  do
9:     edge
10:      add lost edges in
12:while completeness constraint not satisfied do
13:     Select edge from randomly
14:      add lost edges in
Algorithm 1 Image Clustering Algorithm

3.2 Graph-based Local Reconstructions Merging

After image clustering, each cluster can be reconstructed using a local SfM approach. Due to the robustness to outliers, we choose incremental SfM. As the reconstructed images are bounded below a threshold, the drift problem is alleviated. When all clusters are reconstructed, we need a final step to stitch them, as each cluster has its local coordinate frame.

To construct a robust merging algorithm, we consider three main problems:

  • A cluster should be selected as the reference frame, which we referred to as anchor node.

  • The merging step from other clusters to the anchor node should be as accurate as possible.

  • As there may not exist overlap between anchor node and some other clusters, we have to find a path to merge them into an anchor node. Due to the accumulated errors, the path of each cluster to the anchor node shouldn’t be too long.

To deal with the above problem, we construct a graph on the cluster level. The algorithm is composed of three main steps: (1) Cluster graph initialization. (2) Anchor node searching. (3) Path computation and simplification. For cluster graph initialization, we first find the common cameras between pairwise clusters, and compute the similarity transformations. Then we build a minimum spanning tree (MST) to select the most accurate edges. We found the anchor node by dealing with a minimum height tree (MHT) [DBLP:journals/dam/LaberN04] problem. We first show how the problem can be constructed into an MinST problem.

3.2.1 Pairwise Similarity Transformation

We have discussed how to construct overlapping areas in Sec. 3.1, we further utilize the overlapped information to compute the pairwise similarity transformation. Given correspondences of camera poses, i.e., and , we first estimate the relative scale. With relative scale known, the similarity estimation thus degenerated to euclidean estimation.

Relative Scale Estimation

To estimate relative scale, we need at least two points correspondences, and , we can estimate the relative scale by


As there may exists outliers, we choose as

Euclidean Transformation Estimation

When the relative scale is known, the similarity transformation degenerates to euclidean estimation. That is, we only need to estimate the relative rotation and relative translation. Suppose a 3D point is located in global coordinate frame, and is located in local coordinate frame by two euclidean transformation respectively. Then we have


We can further obtain


Then, the relative transformation is


Because cluster and are up to a scale , we should reformulate the relative transformation as


where and are camera centers in cluster respectively. To handle the existence of outliers, we combined the euclidean estimation with RANSAC.

3.2.2 Cluster Graph Initialization

In our approach, each cluster is deemed as a graph node, and the edges connect the nodes sharing some common cameras. Assume that there are clusters of cluster graph

, and we denote the probability of obtaining a good transformation from the cluster pair

as . Consequently, the probability that all edges in a spanning tree can be reconstructed is approximated as


where and are two clusters associated with the -th edge of . can be considered as the probability that a global 3D reconstruction can be reached provided all spanning pairs are correctly reconstructed. Then we try to maximize the probability defined in Equ.(7). This is equivalent to minimizing the cost function


To solve for the optimal spanning tree, we define the weight of an edge connecting clusters and as


Now the problem of maximizing the joint probability is converted to the problem of finding a MinST in an undirected graph. Note that in MinST computing, the concrete value of the edge weights does not matter but the order of them does. That is, A reasonably comparable strengths of connections between clusters, instead of an accurate estimate of , is sufficient to help generate a good spanning tree. This observation leads us to the following residual-error weight definition scheme.

Residual Error

As a reliable measure of the goodness of cluster merging, we use the Mean Square Distance (MSD) to help define the edge weight. The Mean Square Error (MSE) from cluster to cluster is defined as:


where is the similarity transformation from cluster to cluster , is the -th common points in cluster . Equ.(10) describes the transformation error from cluster to cluster . To convert MSE to a symmetric metric, we use the maximum of mse to define the MSD:


Then the edge weight between vertices and in is defined .

3.2.3 Miminum Height Tree Construction

After computing all the weights, the graph initialization process has been completed. Then, we can construct an MinST by Kruskal algorithm to select the most accurate similarity transformations. After finding an MinST, We need to find a base node as the reference of the global alignment of all clusters in the MinST. We impose restrictions on the selection of the base node: (1) The base node should be suitably large. (2) The path from the other nodes to the base node shouldn’t be too long. The first constraint is considered for efficiency. The second constraint is used to avoid error accumulation. Taking a similar idea of Minimum Height Tree (MHT) in [DBLP:journals/dam/LaberN04], We convert the problem of finding the base node into an MHT problem. We first introduce the concept of MHT.

Definition 1.

For an undirected graph with tree characteristics, we can choose any node as the root. The resulting graph is then a rooted tree. Among all possible rooted trees, those with minimum height are called minimum height trees (MHTs).

We solve the MHT problem by merging the leaf nodes layer by layer. At each layer, we collect all the leaf nodes and merged them into their neighbors. At last, there may be two or one nodes left. If there are two nodes left, then we choose the node that has a larger size as the base node. If there is only one node left, then the node is the base node.

The merging process is depicted by Fig.3. The advantage of using algorithm to find the base node is depicted in Fig.4. Owing to the robustness of our algorithm, which can find accurate similarity transformations and the edges which have large are filtered, we are able to merge all sub-reconstructions accurately. The full sub-model merging algorithm is illustrated in Alg.2.

Figure 3: The sub-reconstructions merging process. (a) shows the constructed MinST, where dotted lines represent the edges that have large . In (b), leaf nodes() are denoted by green, and are merged into , is merged into . In (c), the nodes that have been merged are marked by yellow, and are leaf nodes now. In (d), the leaf nodes which have been merged in the first layer are marked by dark yellow, and leaf nodes merged in second layer is marked by yellow. As there are only leaf node left, we choose the node with larger size(as , then should be merged into , and is the base node.)
Figure 4: The alignment results with and without our graph-based local reconstructions merging algorithm.
1:Clusters , Corresponding 3D points , where
2:The final merged cluster
3:Initialize Cluster Graph
4:Construct an MinST by Kruskal algorithm from Cluster Graph
5:i := 0
6:while  do
7:     Find all leaf nodes and their connected nodes
8:     Replace with
9:     Remove nodes
10:     if  then
11:         Select the cluster that has a larger size as the base node
12:         Remove the other node
13:         break      
14:     i := i + 1
Algorithm 2 Graph-Based Local Reconstructions Merging Algorithm

4 Experiments

In this section, we evaluate our GraphSfM on different kinds of datasets, including ambiguous datasets and large scale aerial datasets.

4.1 Experimental Environments

Our GraphSfM algorithm is implemented based on COLMAP [DBLP:conf/cvpr/SchonbergerF16], and is tested on the different kinds of datasets. All the experiments are performed on a PC with 4 cores Intel 7700 CPU and 32GB RAM. Besides, we use SIFT [DBLP:journals/ijcv/Lowe04] to extract feature points for all the evaluated SfM approaches.

4.2 Datasets Overview

To evaluate the robustness and efficiency of our algorithm, we first construct and collect some different kinds of datasets. The first kind of datasets are collected from 9 outdoor scenes, which include small scale and medium scale datasets, and the number of images is from 60 to 2248. The second kind of datasets are collected from public datasets, which include ourdoor scenes (Gerrard Hall, Person Hall, South Building) [DBLP:conf/cvpr/SchonbergerF16] and ambiguous scenes (Stadium and Heaven Temple) [DBLP:conf/eccv/ShenZFZQ16]. The last kind of datasets are 3 large scale aerial datasets, where the memory requirement and efficiency are challenges for traditional approaches.

4.3 Efficiency and Robustness Evaluation

We evaluated the efficiency of our algorithm over 2 state-of-the-art incremental SfM approaches (TheiaSfM [DBLP:conf/mm/SweeneyHT15] and COLMAP [DBLP:conf/cvpr/SchonbergerF16]), and 2 state-of-the-art global SfM approaches (1DSfM [DBLP:conf/eccv/WilsonS14] and LUD [DBLP:conf/cvpr/OzyesilS15]). For sake of fairness, our GraphSfM runs on one computer, though it can run in a distributed mode. The evaluation results are shown in Fig. 5 and table 1. It’s not surprising that the incremental approaches take more time for reconstruction than global approaches. As the dataset scale increases, the time that is taken by COLMAP [DBLP:conf/cvpr/SchonbergerF16] grows rapidly, due to the repetitive and time-consuming bundle adjustment [DBLP:conf/iccvw/TriggsMHF99] step. Though our approach is a kind of incremental one, the scale of the images can be controlled to a constant size in each cluster. Thus the time of bundle adjustment can be highly reduced, and the time grows linearly as the number of images grows. Though TheiaSfM [DBLP:conf/mm/SweeneyHT15] is also an incremental SfM, it selects some good tracks [DBLP:journals/pr/CuiSH17] to perform bundle adjustment, which saves a lot of time but might become unstable in some cases. Besides, the time taken by TheiaSfM surpasses our GraphSfM when the scale of the image is over 2000. Table 1 gives more details of reconstruction results. It is obvious that our GraphSfM is as robust as COLMAP in terms of reconstructed cameras and is more accurate than other approaches in terms of reprojection errors. These facts illustrate the superior performance of our GraphSfM to handle large scale datasets. We emphasize that our algorithm just run on one computer and the reconstruction time could be reduced largely if we run it on more computers in distributed manner.

dataset Images COLMAP [DBLP:conf/cvpr/SchonbergerF16] TheiaSfM [DBLP:conf/mm/SweeneyHT15] 1DSfM [DBLP:conf/eccv/WilsonS14] LUD [DBLP:conf/cvpr/OzyesilS15] Ours
Err Err Err Err Err
DS-60 60 60 16387 0.478 26.22 60 8956 1.915 10.934 60 8979 1.923 1.317 60 8979 1.923 1.360 60 13923 0.456 24.48
DS-158 158 158 68989 0.420 170.34 158 39506 1.911 87.711 157 39527 1.918 7.758 158 39517 1.917 7.951 158 62020 0.438 168.48
DS-214 214 214 71518 0.512 122.64 138 6459 1.704 45.888 187 7080 1.539 4.248 162 5099 1.454 1.691 214 68882 0.487 121.56
DS-319 319 319 154702 0.498 529.14 204 11550 1.796 186.078 290 142967 1.821 17.525 270 13484 1.755 18.774 319 151437 0.473 482.4
DS-401 401 370 166503 0.584 568.68 305 23742 1.967 241.609 348 23081 1.886 18.891 316 22160 1.848 17.931 370 164495 0.552 562.74
DS-628 628 628 268616 0.394 562.74 628 133300 1.918 421.233 610 133146 1.908 34.803 628 133747 1.910 35.029 628 259333 0.388 605.58
DS-704 704 703 345677 0.575 1918.86 449 35659 1.861 603.85 641 42716 1.934 108.154 547 34296 1.908 97.8192 703 346394 0.546 1839.9
DS-999 999 980 419471 0.523 1918.86 733 40246 1.859 731.842 745 172864 1.769 77.798 611 31254 1.742 70.775 980 416512 0.504 2570.34
DS-2248 2248 2248 1609026 0.634 71,736 2248 187392 2.474 7255.700 2247 188102 2.475 667.585 2248 188134 2.474 694.736 2242 1445227 0.650 6,108.06
Table 1: Efficiency and accuracy evaluation with datasets which have different scales. represent the number of 3D points, the number of recovered cameras and the reconstruction time, respectively. Err represents the reprojection error and the best results are highlighted by bold font.
Figure 5: Efficiency evaluation on datasets with different scales.

4.4 Evaluation on Public Datasets

We evaluated our algorithm on several public datasets [DBLP:conf/cvpr/SchonbergerF16, DBLP:conf/eccv/ShenZFZQ16]. For these small scale datasets, we only run our GraphSfM on one computer. Some visual results are shown in Fig. 6 and statistics are given in table 2. COLMAP again is the most inefficient approach, and our approach is 1.2 - 3 times faster than COLMAP, though we only run it by one computer. TheiaSfM selects good tracks [DBLP:journals/pr/CuiSH17] for optimization and the two global approaches are most efficient. However, as shown in Fig. 6, we can see that both the global approaches failed in Person Hall and Guangzhou Stadium datasets, which shows global approaches are easily disturbed by outliers. As an incremental approach, TheiaSfM also failed in Person Hall and Guangzhou Stadium datasets. Our approach is as robust as COLMAP, however, more efficient than it.

Figure 6: Reconstruction results on temple public datasets. From top to bottom are respectively Gerrard Hall, Person Hall and Guangzhou Stadium datasets.
Ambiguous Datasets

It’s a challenging work to reconstruct on ambiguous datasets for SfM approaches. Though feature matches are filtered by geometric constraints, there are still lots of wrong matches pass the verification step. As is shown in Fig.7, our GraphSfM shows advantages to traditional SfM approaches in this kind of datasets. Due to the image clustering step, some wrong edges are discarded in clusters, thus the individual reconstructions are not affected by the wrong matches. However, it is hard to detect wrong matches in traditional SfM approaches, especially in self-similar datasets or datasets with repeated structures, which is the major reason for the failure in ambiguous datasets.

dataset Images COLMAP [DBLP:conf/cvpr/SchonbergerF16] TheiaSfM [DBLP:conf/mm/SweeneyHT15] 1DSfM [DBLP:conf/eccv/WilsonS14] LUD [DBLP:conf/cvpr/OzyesilS15] Ours
Gerrard Hall 100 100 42795 303.066 100 50232 93.346 99 49083 15.821 100 44844 13.816 100 42274 0.010 114 3.848 118.68
Person Hall 330 330 141629 1725.798 113 39101 157.416 42 6239 768.752 325 93386 107.558 330 140859 0.040 713.94 25.128 742.92
South Building 128 128 61151 303.06 128 68812 155.844 128 436640 27.711 128 69110 34.695 128 58483 0.032 125.28 4.745 131.28
Stadium 157 157 85723 418.74 30 6345 18.729 65 6319 5.555 77 4549 4.736 157 71605 0.026 403.86 16.167 421.62
Heaven Temple 341 341 185750 8678.76 336 1201 46039 339 13356 40.959 340 14019 44.089 341 181583 0.044 2784.378 46.737 2856.258
Table 2: Comparison of reconstruction results. represents the number of 3D points and the number of recovered cameras, respectively. respectively denotes the time cost (seconds) of graph cluster step, local SfM step, point clouds alignment step, and the total time.
(a) COLMAP [DBLP:conf/cvpr/SchonbergerF16]
(b) LUD [DBLP:conf/cvpr/OzyesilS15]
(c) GraphSfM
Figure 7: Reconstructions on temple heaven dataset. (a): The result reconstructed by COLMAP [DBLP:conf/cvpr/SchonbergerF16]. (b): The result reconstructed by LUD [DBLP:conf/cvpr/OzyesilS15]. (c): The result reconstructed by our GraphSfM.

4.5 Evaluation on Large Scale Aerial Datasets

Our approach is also evaluated on large scale aerial datasets. We evaluated our algorithm both on one computer sequentially(each cluster is reconstructed one by one) and on three computers in parallel mode. The reconstruction results are given in table 3. Our algorithm can recover the same number of cameras as well as COLMAP. Besides, when running on a single computer, our approach is about 6 times faster than COLMAP; when running on three computers in distributed manner, our algorithm is about 17 times faster than COLMAP. Our algorithm can be further accelerated if we own more computer resources. TheiaSfM is slightly slower than our approach, however, the the number reconstructed 3D points is one magnitude less than our approach. Though 1DSfM and LUD are still the most efficient, their robustness would meet challenge in large scale aerial datasets.

dataset Images COLMAP [DBLP:conf/cvpr/SchonbergerF16] TheiaSfM [DBLP:conf/mm/SweeneyHT15] 1DSfM [DBLP:conf/eccv/WilsonS14] LUD [DBLP:conf/cvpr/OzyesilS15] Ours
Aerial-5155 5155 5155 1798434 41823.27 4383 203942 3527.43 4591 243490 482.45 4723 278924 390.59 5155 1834875 2491.78 936.74
Aerial-7500 7500 7455 5184368 95007.59 5327 432347 6237.91 6264 478234 931.84 5934 467230 832.40 7455 4968142 5834.37 2166.59
Aerial-12306 12306 11259 3934391 146172 8347 478237 25783.12 8923 509543 4941.27 8534 489238 4589.73 11259 3916724 22663.86 8970.01
Table 3: Comparison of reconstruction results. represents the number of 3D points and the number of recovered cameras, respectively. denotes the total time and denotes the total time that is evaluated in a distributed system.
Figure 8: Reconstructions on large scale aerial datasets.

5 Conclusion

In this article, we proposed a new SfM pipeline called GraphSfM, which is based on graph theory, and we designed a unified framework to solve large scale SfM tasks. Our two steps graph clustering algorithm enhances the connections of clusters, with the help of a MaxST. In the final fusing step, the construction of MinST and MHT allows us to pick the most accurate similarity transformations and to alleviate the error accumulation. Thus, our GraphSfM is highly efficient and robust to large scale datasets, and also show superiority in ambiguous datasets when compared with traditional state-of-the-art SfM approaches. Moreover, GraphSfM can be implemented on a distributed system easily, thus the reconstruction is not limited by the scale of datasets.